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ABSTRACT 



A multitasking data processing system is provided with a 
hardware-configured operating system kernel. The system 
includes a processor queue that includes a plurality of word 
stores, each word store storing a task name, in execution 
priority order, that is ready for processing. An event queue 
in the kernel includes a plurality of word stores for storing 
task names that await the occurrence of an event to be placed 
in the processor queue. When an associated processor sig- 
nals the occurrence of an event, matching logic searches all 
word stores in the event queue, in parallel, to find a task 
associated with the signalled event and then transfers the 
task to the processor queue. Shift logic is also provided for 
simultaneously transferring a plurality of task names, in 
parallel, in the processor queue to make room for a task 
name transferred from the event queue. 

17 Claims, 7 Drawing Sheets 
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HARDWARE-CONFIGURED OPERATING 

SYSTEM KERNEL HAVING A 
PARALLEL-SEARCHABLE EVENT QUEUE 
FOR A MULTITASKING PROCESSOR 



CROSS REFERENCE TO RELATED 
APPLICATION 

This is a continuation of application Ser. No. 07/776,931 
filed on Oct. 15, 1991, now abandoned. 

FIELD OF THE INVENTION 

This invention relates to processors that are programmed 
to operate on many tasks in parallel, and more particularly, 
to a hardware implementation of a portion of the processor's 
operating system to enable multitasking operations to pro- 
ceed more rapidly. 

BACKGROUND OF THE INVENTION 

Real-time data processors are able to handle the process- 
ing of a number of tasks on a concurrent basis. Multipro- 
cessors perform this function by providing a number of 
processors that operate on the tasks in parallel. Single 
processor systems, operating in real-time, handle "parallel" 
processing on a multitasking basis. Multitasking is a process 
of executing several tasks concurrently or in parallel. The 
concurrently processed tasks execute logically at the same 
time, even though they may not execute physically at the 
same time. 

Multitasking operations perform under control of the 
computer's operating system and, in particular, under con- 
trol of an "Executive" segment thereof that controls the 
scheduling of the concurrently running tasks. The Executive 
segment (hereinafter called EXEC) provides an interface 
between the tasks and the central processing unit. To each 
task, the EXEC appears as the task's own central processing 
unit. The EXEC handles all of the details involved in sharing 
the physical CPU with all of the tasks in the system. Thus, 
an EXEC controls which task should have possession of the 
CPU at any time by examining priority and readiness levels 
assigned to each task. 

In general, the EXEC enables the running of one task on 
the CPU until another request to use the CPU is received 
from a higher priority task. The running task is "suspended", 
and the higher priority task is allowed to run. This process 
may occur many times over but sooner or later the highest 
priority task will complete its processing and voluntarily 
suspend itself. Thus, even though a lower priority task may 
be suspended many times before its completion, it eventu- 
ally completes and produces the same result as if it had run 
uninterrupted. 

Tasks have various execution states. An active task is one 
which has control of the CPU and is executing. Only one 
task can be active at any given time on a multitasking CPU. 
An inactive task is not executing and is not waiting to 
execute. It is simply idle. A ready task is one which is 
waiting for CPU time to become available so that it can 
execute. A waiting task is one that has suspended operation 
until the occurrence of some event. The event can be 
generated by another task, or a hardware event. Upon 
occurrence of the event, the task is moved from the waiting 
state to either the ready or the active state depending upon 
the priority of currently active tasks. 
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2 

Tasks may be synchronized by priority or by task readi- 
ness or a combination of both. In general, an EXEC includes 
a number of utility routines that are used by tasks to perform 
necessary control functions. Under these utility routines, any 
5 task may schedule another task, suspend itself or another 
task, signal an event, wait for an event or delay an event for 
a period of time. The principle EXEC utilities are as follows: 
Schedule, Suspend, Signal, Wait and Delay. The Schedule 
utility is used by an active task when it wants another task 
to begin or resume execution. The Schedule utility allows 
the highest priority ready task to execute. The Suspend 
utility is used by an active task to remove itself from the 
ready state or move another task to the inactive state. 
Control is then given to the current highest-priority ready 
task. 

15 The Signal utility is used by the active task or an interrupt 
service routine to generate a particular event. If no task is 
waiting on the event that is signalled, the EXEC returns 
control to the active task. If another task is waiting on the 
signalled event, control returns to the highest priority task. 

The Wait utility is used when the active task wishes to 
suspend execution and resume on an occurrence of a par- 
ticular event, or events. 
The Delay utility is used by the active task when it needs 

25 either itself or another task to wait a specific amount of time 
before resuming execution. This utility allows an asynchro- 
nous task to be synchronized using time as a reference. 

An operating system incorporates an interrupt facility 
which is the primary means for synchronizing firmware 

30 operations with hardware events. When an interrupt occurs, 
control is transferred from whatever task is executing to an 
interrupt handling module. One interrupt module exists for 
each type of interrupt that can occur. 
A variety of data structures are employed by the EXEC to 

35 implement the utilities above described. A CPU queue is a 
list of tasks in the ready state. The highest priority task is the 
currently active task. If there are two ready tasks of the same 
priority, the task that is ready first becomes the active task. 
An Event queue is a list of tasks in a waiting state. A Delay 

40 queue is another list of delayed events which, after the 
duration of a delay, will cause tasks waiting on these events 
to be moved to the CPU queue to become ready tasks. 

The described queues are dynamic in structure and must 
accommodate at least four basic operations: insertion of an 

45 item; deletion of an item; location of an item; and modifi- 
cation of an item. Each queue may be configured as a linear 
list, a singly linked list, or a doubly linked list. A singly 
linked list is one where each item contains a pointer to the 
next item in the list A doubly linked list has items with two 
linked fields. Each item in the list has a pointer to the next 
item to the right and to the next item left of it. 

As above indicated, an EXEC is generally configured in 
software as a portion ("kernel") of the operating system. 

55 Often, the EXEC is required to perform queue-wide opera- 
tions that include comparisons and shifting of the contents of 
the queue. Such operations are performed serially and their 
time of execution varies at least linearly with the length of 
the queue. Such execution times can have a deleterious 

60 affect on the performance of the operating system and 
presents real impediments to the improvement of the sys- 
tem's performance. 

Accordingly, it is an object of this invention to provide a 
multitasking data processing system with an EXEC that is 

65 not constrained by serial processing operations. 

It is another object of this invention to provide an EXEC 
that enables more efficient operation of a multitasking data 
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processing system. 

It is another object of this invention to provide a multi- 
tasking data processing system with a hardware imple- 
mented operating system kernel wherein queues maintained 
in the kernel require no separate priority indication. 5 

SUMMARY OF THE INVENTION 

A multitasking data processing system is provided with a I 
/hardware-configured operating system kernel. The system |10 
/ incjAdes a processor queue that includes a plurality of word 
is -jjtSres, each word store storing a task name, in execution 
(^rpriority order, that is ready for processing. An event queue 
in the kernel includes a plurality of word stores for storing 
task names that await the occurrence of an evemto be placed \15 
in the processor queue. When an associated processor sig- 
nals the occurrence of an event, matching logic searches all 
word stores in the ev ent queu e, in parallel, to find a task 
.'assSCtatea with ihe sighallecTevent and then lfansrers the 
'tail Oo^EhTprorosw'qn eue'. Sh'iTt logic is also provided tor 
simultaneously transferring a plurality of task names, in 
parallel, in the processor queue to make room for a task 
name transferred from the event queue. 

DESCRIPTION OF THE DRAWINGS 25 

FIG. 1 is a high level block diagram of a system incor- 
porating the invention hereof. 

FIG. 2 is a block diagram of a CPU queue used in the 
system of FIG. 1. 30 

FIG. 3 is a block diagram of each queue element within 
the CPU queue of FIG. 2. 

FIG. 4 is more detailed logic of a word store within a 
queue element word of FIG. 3. 

FIG. 5 is a circuit diagram of a content addressable 
memory cell. 

FIG. 6 is as block diagram of a set of event count registers 
used with the invention. 

FIG. 7 is a block diagram of a delay timer used with the 40 
invention. 

FIG. 8 is an exemplary set of read/write registers con- 
tained within the command and status registers shown in 
FIG. 1. 

45 

DETAILED DESCRIPTION OF THE 
INVENTION 

In brief, this invention removes certain critical operating 
system functions from operating system software and imple- 50 
ments them in a hardware structure. Through this mecha- 
nism, certain often-found data reorganization operations 
may be performed in parallel and in a predeterminedji me_ 
interval. T he hardware kernel employs, in tne main, rive 
commands, i.e., Sch edule, Suspend, Signal, Wait and Delay. 55 
Through these commands, and combinations thereof, a set o f 
real-time operating system primitives are created which 
I enabl e substantial improvement in an attached j jrcg essorV 
[ operating parameters. D ue to the hardware configuration of 
the queues in the keme l.j)ueue-wide operations are easily; 60 
acfro'fflptigfigfl in a fraction of the time n eeded for similar 
operatiohTin a software environment^ ' * 

EXEC MODULE (KERNEL) 

Turning now to FIG. 1, a high level block diagram is 
shown of the system. A microprocessor 10 has incorporated 65 
therewith, an EXEC module 12 which is implemented in 
hardware. EXEC module 12 contains three main functional 



blocks, i.e., a command and status register set 14, a queue 
state machine 16 and a queue system 18. Queue system 18 
contains the data structures that all utilities manipulate. 
There are three physically identical queues in queue system 
18, i.e., CPU queue 20, Delay queue 22 and Event queue 24. 

CPU queue 20 is a priority-order ed list pr names of Task 
ConffDTTJIbl^ITOT 

TCbs to Be placed in the CPU queue. CPU queue 20 controls 
the sequence that the various tasks execute within micro- 
processor 10. TCBs contain all the information pertaining to 
the status of a task and are used within microprocessor 10 to 
control such tasks. The name of each TCB is a value (e.g., 
from 0 to 255) which indicates its priority as well as 
designating the TCB. The priority of the task, referenced in 
the task's TCB name, is used to determine when a task is 
given possession of microprocessor 10. The TCB at the 
"head" of CPU Queue 20 retains possession of micropro- 
cessor 10 for that task. If a task of higher priority is placed 
in the queue, the currently running task is replaced by the 
higher priority task at the head of CPU Queue 20 and the 
task of higher priority executes. At any time, the number of 
tasks' TCBs in CPU Queue 20 may range from all (all tasks 
ready to execute), to none (no active or ready tasks). 

Event queue 24 contains a priority-ordered list of names 
of TCBs, with each TCB in the list joined to an event name, 
upon whose occurrence, the TCB will be moved to CPU / 
Queue 20. Delay Queue 22 contain s a list of ev ent names ,fr 
each event name associated with a delay which indicates the' 



amount of time before the associated event will be signalled^ 



The event names in Delay Qu eue 22 are prioritized, based 
upo n tneir associated delay values., 
^QOenT" system 18 also includes a set of event count 
registers 26 that keep track of the availability of a prede- 
termined number of possible events that can occur within the 
system. It is to be recalled that an event can be signaled and 
waited on by system tasks. The number of events that can be 
accommodated by the system is dictated by the size of an 
address and is not to be considered a limitation of the 
system. For purposes of description it will be assumed that 
the system can accommodate up to 256 possible events and 
utilizes an 8-bit address for any such event. 

Within command and status registers 14 is a command 
register that is written to by microprocessor 10 to start an 
EXEC utility. In addition, there are a number of registers that 
are "read-accessible" by microprocessor 10 and contain data 
produced by operations within EXEC module 12. A list of 
those registers will be found below. 

Queue state machine 16 controls the actions of EXEC 
module 12. Upon receipt from microprocessor 10 of an 
EXEC command into a register in command and status 
registers 14, queue state machine 16 executes the required 
actions on a queue or queues in queue system 18. Queue 
state machine .16 also performs required actions on registers 
in command and status registers 14 or, further, controls a 
timer 28 and a port control module 30. Port control module 
30 provides I/O command functions for EXEC module 12. 
Address and data connections between microprocessor 10 
and EXEC module 12 are provided directly through com- 
mand and status registers module 14 via lines 31 and 33, 
respectively. 

SIGNAL LINE DEFINITIONS. 

Signal lines existing between microprocessor 10 and 
EXEC module 12 are as follows: 
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Inputs to EXEC Module 12 

CS - chip select 

R/W - read/not (write) line 

uPAddr (0-7) - address lines 

SCLK - EXEC system clock (not shown) 

NRST - External Reset lice 

NTST - Test line (for test mode operation) 

Bidirectional lines to EXEC Module 12 

uP Data (0-15) - data lines 
Outputs from EXEC Module 12 

NDTACK - data transfer acknowledge 
(asynchronous acknowledge) 
NINT - interrupt line to CPU 



The following signals go between Queue State Machine 
16 and each of queues 20, 22 AND 24 via bus 32 (but are 
not specifically illustrated in FIG. 1). 

D (0-7 ) — EXEC Internal Data Bus 

ADDR (0-15) — EXEC Internal Address Bus 

CMD (0-3) — Command Lines to Queue system 18 

QEN (0-3) — Queue Command Enables (one for each 
queue and array) 

There are 256 queue elements in each queue. Each queue 
element stores two, 8-bit words. Therefore, 8 data lines and 
9 address lines are required to access any word in any queue. 
The ninth address line is the CMD(3) signal line. CMD(0-2) 
are used to specify which of 8 different commands will be 
performed. CMD(3) determines which of two queue element 
words the operation will be performed on. QEN(0-3) lines 
are used to enable only one of the queues. 

Commands can be sent to a particular queue to execute 
arithmetic comparisons. These arithmetic comparisons are 
done on all queue elements in parallel. The result of such 
comparisons is used to make parallel shifts in the queue in 
order to insert or delete items. This yields a significant speed 
improvement over manipulating these data as is typically 
done with pointers into a list. The following additional 
signals run from Queue State Machine 16 to the Event Count 
Registers 26 over Bus 32 (not specifically shown in FIG. 1) 

ECOP (0-l>— Event Counter Operations 

ECO C— Event Counter Output Control 

ECCLK — Event Counter Clock 

ECLR — Event Register Array Clear 

EOC— Event Register Output Control 

These signals are used to control 256 8-bit registers in 
Event count registers 26 and a counter therein which can be 
loaded with any of these values and incremented or decre- 
mented. 

Each register location in Event Count registers 26 is 
associated with a correspondingly numbered event. When a 
"Signal of a particular event occurs, the appropriate register 
value is read, incremented, and re-written. Likewise, when 
a Wait is executed on a particular event, the appropriate 
register's value is read, decremented, and re-written. 
QUEUE STRUCTURES 

T\irning now to FIG. 2, the basic structure of CPU queue 
20 is illustrated. It will be recalled, that the hardware 
structure of each queue is identical. TCB names are stored 
in CPU queue 20 in priority order. Rather than a assigning 
a separate priority value to a TCB name, it has been 
determined that substantial storage area can be conserved by 
assigning as a TCB name, the actual priority value assigned 
to the TCB. Thus, a TCB having the name 0 has the highest 
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priority value and is referred to by an address indication in 
EXEC module 12 by an all-Os address. Other TCBs of lower 
priority are similarly denoted. 
CPU queue 20 contains a plurality of queue elements 35. 

5 For illustration purposes, CPU queue 20 is shown with only 
four queue elements, instead of the 256 which can be used 
with an eight fait address. The structure of a queue element 
is shown in FIG. 3 and will be described in detail below. 
Data lines D(0-7) carry data into and out of data ports in 

10 each of queue elements 35 (see FIG. 2). CMD(0-3) are 
address lines that are used to specify which of eight different 
commands will be performed by each queue element 35. The 
CMD(3) line provides a level which determines which of 
two words in a queue element 35 the operation will be 

is performed upon. QEN(0) is an enable line and enables one 
of the three queues (20, 22, 24). 

An address bus ADDR (0-7) is decoded by address 
decoder 36 and is used to select an individual queue element 
35 (via an output on cable 38 which contains 256 separate 

20 word lines). Each of the word lines in word bus 38 is 
connected to an individual queue element 35 so as to enable 
logical operations in any one chosen queue element in 
accordance with a received address into address decoder 36. 
Such connections are not shown in FIG. 2 to avoid over- 

25 complication of the diagram. 

Address bus ADDR (0-7) also carries address information 
back to queue state machine 16 during comparison opera- 
tions. During such an operation, the address of a queue 
element 35 that meets a comparison criteria is returned on 

30 address bus ADDR (0-7). 

A resolver/encoder 40 resolves the case where more than 
one queue element 35 meets a comparison criteria. For 
instance, if a command is issued to return an address of a 
queue element that contains "numeral 0" and all queue 

35 elements contain "0", then resolver encoder 40 causes 
address bus ADDR 0-7 to return the lowest numbered queue 
location to queue state machine 16. In the example given, 
resolver encoder 40 would return location 1 on address bus 
ADDR (0-7). 

40 Each queue element 35 contains two words of information 
pertaining to a task. In general, queue element 35 at position 
1 in CPU queue 20 will contain two words pertaining to the 
highest-priority task awaiting action. Those words are the 
TCB name having the lowest numerical value (priority) and 

45 the name of the event that caused the TCB name to be moved 
into CPU queue 20. Queue elements at position 3, 4 etc. will 
contain TCB names with lower priority (and higher numeri- 
cal value). 

There are two local buses 42 and 44 that connect each 

50 queue element 35 to its next higher-numbered neighbor 
queue element and are used when inserting or deleting items 
from a queue. Each queue element 35 has two comparison 
operations incorporated into its structure, the results of 
which are manifest at GT output 46 and M output 48. The 

55 signals "GT' and "M" stand for "greater than" and "match", 
respectively. In brief, when data is applied to data bus 
D(0-7), each queue element 35 determines whether a data 
word contained within it matches in value or is greater in 
value in relation to the applied data word. If it finds that its 

60 stored word value is greater than the data bus value, then an 
output is impressed on output GT line 46. If a match is found 
between the values, M output 48 is energized. A multiplexer 
50 is connected to all queue elements 35 and is controlled to 
select either all "greater than" outputs 46 from queue 

65 elements 35 or all "match" outputs 48 from queue elements 
35. In the example shown in FIG. 2, multiplexer 50 provides 
four outputs from the selected greater than or match outputs 
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from the four queue elements 35 shown. Which group of 
signals is selected is dependent on whether it is desired to 
know the lowest-numbered queue element 35 that matches 
the input data or the lowest-numbered queue element 35 that 
holds a value greater than the input data. 
QUEUE ELEMENT STRUCTURE 

Turning now to FIG. 3, the structure of a queue element 
35 is illustrated. In each queue element 35, a pair of word 
stores 52 and 54 hold two 8-bit values, i.e., word A and word 
B. In addition, each word store contains the necessary logic 
to enable a comparison to be made between an incoming 
data word and the word contained therein. Details of that 
logic structure are shown in FIG. 4 and will be further 
considered below. 

A queue element 35 holds the following information: 



Queue Name 


Word A 


Word B 


CPU Queue 


Task Name/Priority 


Event Case 


Event Queue 


Task Name/Priority 


Event Name 


Delay Queue 


Delay Value 


Event Name 



20 



30 



The Task name/priority is manifest by aTCB name whose 
value is directly related to its priority (as above described). 
An "Event Case*' is the event name value that occurred that 25 
caused the task denoted by a TCB in word A, to be moved 
into CPU queue 20. An "Event Name" is a name or value 
given to a specific action within microprocessor 10. For 
instance, an event name may be a value assigned to a 
hardware interrupt, an I/O interrupt, etc. A "Delay Value" is 
a value assigned to a time before an event is to occur. 

Each word store 52, 54 has a GT and M output that is 
applied, respectively, to multiplexers 56 and 58. The CMD 
(3) line is applied to multiplexers 56 and 58 and its level 
selects which word (word A or word B) is compared to 35 
provide the desired output on either GT line 46 or M line 48. 

LAST busses 42 and 44 and NEXT busses 42' and 44' are 
used to shift words between queue elements 35. Each word 
store 52, 54 has its own LAST and NEXT bus that allows the 
contents of an entire queue element 35 to be shifted in one 40 
parallel operation. The contents of a word store 52 and a 
word store 54 always occupy the same queue element 35 
together. 

WORD STORE STRUCTURE 

In FIG. 4, the details of a word store 52 is illustrated. A 45 
multiplexer 70 receives four sources of input data for word 
store 52 (word A). Those sources are as follows: 

LAST A(0-7)— Contents of next lower Queue Element 
Word 

D(0-7)— EXEC Internal Data Bus 50 
NEXT A(0-7) — Contents of next higher Queue element 
Word A 

VDD — used to precharge BIT and nBIT (complement) 
lines during a comparison in a content-addressable 55 
memory 72. 

Content addressable memory (CAM) 72 contains eight, 
parallel connected bit cells for word A. Each bit cell pro- 
vides both true and complement outputs for its respective 
bit. A pair of multiplexers 74 and 76 enable a comparison 60 
operation to be carried out with respect to the 8-bit word 
held in CAM 72. 

The details of a one-bit CAM ceil are shown in FIG. 5. Bit 
lines 82 and 84 are separate, but the word lines 81 and 83 of 
all eight one-bit CAM cells are tied together (not shown) to 65 
form a common 8-bit word line. The basic memory cell is a 
CMOS flip-flop 85 of known structure. To write into cell 85, 



data is placed on bit line 82 and complement data is placed 
on complement bit line 84. Then, word lines 81, 83 arc 
asserted causing cell 80 to switch in accordance with the 
applied inputs. A read operation commences by prccharging 
bit and bit complement lines 82 and 84. Word lines 81, 83 
are then asserted and bit line 82 manifests the value of the 
cell. 

For a comparison operation, true data against which 
comparison is to be made is applied to complementary bit 
line 84 and complement data is placed on bit line 82. If the 
applied data matches that in cell 85, then a match transistor 
86 remains in a nonconductive state, thereby enabling match 
line 88 to be unaffected by the state in cell 80. There is one 
match line 88 in each memory cell. 

Returning to FIG. 4, a queue element logic module 73 
receives a plurality of commands over CMD(0-2) lines 
decodes each command using wired logic and accordingly 
energizes one or more of its output lines S1-S5, E and W. 
The output lines from queue element logic module 73 are 
applied to multiplexers 70, 74, 75, content addressable word 
memory 72 and a de- multiplexer 76 to enable a received 
command to be performed. 

A comparator 77 examines the Match line outputs from 
CAM 72 and data inputs appearing on data bus D(0-7) to 
determine if received data has a greater value than that 
stored in word store 72. Comparator 77 includes an exclu- 
sive NOR input stage (not shown) which reconstructs the 
stored data word from the Match line outputs and enables a 
subsequent magnitude comparison of the input values. If the 
input data is less in value than the stored word, the GT 
output from comparator 77 is asserted. If the input data is 
equal in value to the stored word, AND gate 78 asserts its M 
output 

QUEUE COMMANDS 

Each word store 52/54 responds to the following com- 
mands: 

Match= — return highest-numbered location containing 
the compared word 

Match> — return highest-numbered location containing 
the next higher value than the compared word 

SR — shift contents of queue right (to next higher-num- 
bered location) 

SL — shift contents of queue left (to next lower-numbered 
location) 

RD — read contents of addressed Word Store onto D(0-7) 
LD— load D(0-7) into addressed Word Store 
Init — initialize all queue element words to value=all l's. 
The "Match" commands are comparison commands and 
are executed in every queue element in a queue with respect 
to a selected word store (word A or word B). The RD and LD 
commands are only executed on one addressed word in a 
queue. The SR and SL commands both contain addresses 
which indicate a queue element word store from which data 
shifting is to occur. Thus, SR and SL shift commands only 
execute within queue elements whose address location is 
equal to or greater than the specified address. The selective 
shift operation is enabled by feeding the specified address 
from Queue State machine 16 to address decoder 36 (FIG. 
2) in the respective queue. Address decoder 36 responds to 
the address by activating the required word lines on bus 38 
which feed into each queue element (e.g., Word (0) which 
feeds into queue element logic module 73 in FIG. 4). 

The Init command is used to initialize all word stores in 
a queue to the 1 state. This value indicates that a queue 
element is empty. 
Returning to FIG. 4, the operations of Word store 52 in 
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response to the Match =, RD, LD and SR commands will be 
considered. 

The Match =command is initiated by appropriate values 
being emplaced on CMD(0-2) and QEN(O) lines flowing 
into queue element logic module 77. Those values indicate 5 
a Match =command for the specific queue in which word 
store 52 is present The data value to be matched is presented 
to word store 52 on the D(0-7) lines from the data bus. 
Queue element logic module 73 activates output signals SI 
and S2 to cause multiplexers 70 to pass the D(0-7) inputs to 10 
multiplexers 74, 75. Signal S3 from queue element logic 
module 73 is asserted and selects D(0-7) to flow to 
NBIT(0-7) input to word store 72. Signal S3 further causes 
the complement of D(0-7) to flow to the BIT(0-7) input to 
word store 72. If D(0-7) matches the value held in word 15 
store 72, resulting outputs on the Match lines cause AND 
gate 78 to assert Match output M. 

Assume now that a Read Contents (RD) instruction 
appears over the CMD(G-2) lines accompanied by a QEN(O) 
enable signal. Queue element logic module 73 interprets 20 
these levels as an RD- A (read) word A command. An applied 
level to word(O) input to queue element logic module 73 
indicates that this particular queue element's word A will be 
read onto data bus D(0-7). The read operation is initiated by 
pre-charging (raising to the logical 1 state) the bit lines to 25 
each bit cell in word store 72. This is accomplished by 
tri-stating outputs from multiplexers 74 and 76 with an 
assertion of the E line from queue element logic module 73. 
The word line to each bit cell in word store 72 is then 
asserted and the 8-bit values stored therein are read onto data 30 
bus D(0-7) through demultiplexer 76 under control of 
signals S4 and S5 from queue element logic module 73. 

A Load (LD) command is commenced by command 
values applied onto CMD(0-2) inputs to queue element 
logic module 73, accompanied by a QEN(O) enable signal. 35 
An assertion of the word(O) input to queue element logic 
module 73 indicates that this particular queue element word 
A will be loaded from data bus D(0-7). In response to the 
LD command, queue element logic module 73 asserts sig- 
nals SI and S2 to multiplexer 70 which, in turn, causes the 40 
D(0-7) inputs to be connected to the input of multiplexers 74 
and 75. Levels asserted on the S3 and S4 outputs from queue 
element logic module 73 then select the data values D(0-7) 
and their complements to be connected to bit (0-7) and 
NBIT (0-7) inputs to word store 72, respectively. Word line 45 
W from queue element logic module 73 is then asserted and 
causes storage of the data values into word store 72. 

To accomplish a Shift Right command (SR), the 
CMD(0-2) lines are asserted with the proper command 
values accompanied by an enable signal on QEN(0). Word 50 
line word (0) is asserted indicating that this particular queue 
element word A will be part of a partial or total queue 
position shift. An SR (or SL) command is a concatenation of 
the above described RD and LD commands. The only 
differences are logic signals S4 and S5 from queue element 55 
logic module 73 select NEXTA(0-7) as outputs of the read 
function and the SI and S2 outputs select LASTA(0-7) as 
inputs for the load (for the SR command). 

A parallel operation Shift Right is achieved by reading all 
queue element words simultaneously onto the separate 60 
NEXTA buses, and then simultaneously loading all queue 
element words from their separate LASTA buses. While not 
expressly shown, each queue elements NEXTA bus is con- 
nected to the next higher-position queue element's LASTA 
bus. From the above, it can thus be seen that shifts of data 65 
between queue elements occur in parallel and simulta- 
neously and may occur between selected queue elements or 



between all queue elements in a queue. 
EVENT COUNT REGISTERS 

EXEC module 12 includes 256 event count registers 26, 
one for each possible event (as limited by an 8-bit address). 
In FIG. 6, an 8-bit event counter 100 and the first three event 
count registers 102, 104 and 106 are shown. Event counter 
100 is programmable and responds to either an event count 
being loaded via data bus D(0-7) or to an event count from 
one of registers 102, 104 or 106. 

Each event count register has an assigned value indicative 
of one of 256 events which can occur in the system. If a 
register indicates a plus count, that is an indication that 
multiple tasks are waiting for the event to occur. If the event 
count register indicates a negative value, the indication is 
that the event has been signalled (occurred) more times than 
there are tasks waiting for the event's occurrence. Upon the 
occurrence of an event, the event count register correspond- 
ing to that event is examined to see the state of its count If 
the count is seen to be positive, then queue state machine 16 
knows that a task is present in event queue 24 and is awaiting 
the occurrence of the specific event. In such a case, event 
queue 24 is searched, in parallel, to find all TCBs that 
specify the specific event The highest priority TCB that 
specifies the event is then chosen for execution. Queue state 
machine 16 transfers the chosen TCB from event queue 24 
to CPU queue 20, where it is placed in priority order. When 
the task is removed from event queue 24, the value in the 
event count register is decremented through the action of 
counter 100. 

When an event occurs, the Signal utility causes the value 
of the corresponding event count register to be read and used 
to program counter 100. Then, counter 100 decrements the 
value, which decremented value is then written back into an 
appropriate register. A Wait utility causes the same sequence 
to occur with the exception that the value is decremented. 
The value of any event count register (102,104, 106, etc.) is 
available to be read by queue state machines 16 via data bus 
D(0-7). The following command lines from queue state 
machine 16 are applied to counter 100. 

ECOP(O-l) are used to specify whether counter 100 
should increment, decrement, load, or clear. 

ECOC is used to enable the output from counter 100 onto 
the data lines D(0-7) 

ECCLK is the clock used to cause an operation on counter 
100. 

ECLR is used to initialize all event count registers 102, 
104, 106, etc. to 0. 
DELAY TIMER 

In FIG. 7, the details of delay timer 28 in FIG. 1 are 
shown. Delay timer 28 counts system clock cycles and 
causes queue state machine 16 to signal any event whose 
delay is up. The number of clock cycles in a delay unit (i.e. 
a unit of delay time) is programmable and is held in a timer 
interval register within command and status registers 14. 
The delay unit value comes into an 8-bit counter 110 in delay 
timer 28 on TIR(0-7) lines from the timer interval register. 
8-bit counter 112 counts the number of delay units by 
accumulating the number of clock cycles in each delay unit 
and then incrementing to a next delay unit count. 8-bit 
counter. 112 counts up to 255 and then rolls over to 0. The 
Delay utility delays events by a relative time, not an absolute 
time. 

COMMAND AND STATUS REGISTERS 

With reference to FIG, 1, command and status registers 
comprise a number of microprocessor-accessible registers 
which receive 16 bit data (and commands) from micropro- 
cessor 10 and 8-bit data from various of the elements within 
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EXEC module 12. All registers are 16 bits in width and each 
register is either readable or writeable, but not both. For 
instance, in FIG. 8, examples of a write register 120 and read 
register 122 are shown along with their interconnections and 
control circuitry. Register 122 is a read register and receives 5 
data from EXEC module data bus D(0-7). Such data may 
subsequently be read out to microprocessor 10 upon appli- 
cation of enable signal ROEN(5). Similarly, write register 
120 receives data from microprocessor 10 and upon appli- 
cation of an enable signal RIEN(O), provides such data as an JQ 
output on data bus D(0-7). 

The following is a list of microprocessor accessible 
registers (not shown) within command and status registers 
14 and a description of the respective register's operation 
during execution of an associated utility operation. 

Schedule Register (a Write Register): When a Task TCB 15 
is written to this register, the TCB is placed in CPU Queue 
20 according to its priority (TCB value). 

Suspend Register (a Write Register): When a Task TCB is 
written to this register, the TCB is removed from CPU 
Queue 20; if it is indeed in CPU Queue 20. If the TCB is not 20 
in CPU Queue 20, an interrupt condition is generated to 
microprocessor 10. Status indicating this condition is set in 
a Status Register. 

Signal Register (a Write Register): When an Event control 
block is written to this register from microprocessor 10, this 25 
event is signalled. If no task is waiting for this event, the 
only action taken is to decrement the Event Count for this 
event. If tasks are waiting for this event, then the TCB with 
the highest priority (lowest TCB value) is removed from 
Event Queue 24 and placed in CPU Queue 20. 30 

Wait Register (a Write Register): When an Event control 
block is written to this register from microprocessor 10, the 
running task's TCB is placed in Event Queue 24 to wait for 
the named event. The running task* s TCB is found at the first 
queue position in CPU Queue 20 (position 0). This TCB is 35 
removed from CPU Queue 20 and placed in Event Queue 24 
according to its priority (TCB value). 

Delay Register (a Write Register): When an Event control 
block and an 8-bit delay value is written to this register from 
microprocessor 10, the Event control block name is placed 40 
into Delay Queue 22, prioritized by its delay value. The 
lower the delay, the closer to the head of the queue this block 
name is placed. 

Status Register (a Read Register): The Status register 
contains information on any abnormal conditions that occur 45 
within the EXEC chip. 

Timer Interval Register (a Write Register): The user 
writes a 16-bit value to this register which represents the 
number of system clocks that make up each delay value. 

CPUQ Register, Event Q Register, Delay Q Register 50 
(Read Registers): These registers, when read by micropro- 
cessor 10, will sequentially give the contents of all the 
Queue Element Words in the appropriate queue. These are 
Diagnostic Registers. 

Diagnostic Pointer Register (a Write Register): When 55 
written to with a value, the diagnostic registers above when 
read will be set to read from this location within the queues. 

Active TCB Register (a Read Register): When read, the 
TCBName of the current task that has highest priority in the 
CPU Queue is returned. 60 

Event Case Register (a Read Register): When read, the 
Event control block Name of the event that caused a task to 
resume execution is returned This value only has meaning 
if this task has executed a Wait utility call. 
QUEUE STATE MACHINE 65 

Queue state machine 16 controls the workings of EXEC 
Module 12. Queue state machine 16 is, in turn, controlled by 
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internal hardware that responds to commands written to 
command and status registers 14, to sequentially energize 
control lines to cause system operations in accordance with 
a specified command. In all respects queue state machine 16 
is conventional and its arrangement is known to those skilled 
in the art. 

EXEC MODULE OPERATION-SCHEDULE UTILITY 

Assume that microprocessor 10 writes a TCB of a task 
that it wishes to schedule, to the Schedule Register within 
command and status registers 14 in EXEC module 12. The 
TCB value is thereby latched within the Schedule register. In 
response, queue state machine 16 performs a parallel search 
of CPU queue 20 to find a stored next lower priority TCB 
from the TCB being scheduled. Once found, the TCBs 
within CPU 20 are shifted one position to the right, starting 
from the incoming TCB's queue position. The position 
within CPU 20 vacated by this rightward shift is loaded with 
the TCB of the task to be scheduled, 

EXEC module 12 now asserts a handshake signal 
NDTACK from Port control 30 to indicate that the transac- 
tion may now be terminated. In response, microprocessor 10 
de-asserts its CS signal to Port control 30, which signal 
precedes every transaction with EXEC module 12. 

CPU queue 20 has now been modified so that the TCB 
name of the task to be scheduled is present in CPU queue 20 
in its proper priority position. Microprocessor 10, to deter- 
mine what is now the highest-priority task awaiting execu- 
tion, reads the Active TCB register from command and 
status registers 14. Microprocessor 10 may also read the 
Event case register. The access by microprocessor 10 to the 
Active TCB register causes queue state machine 16 to read 
the TCB name from the first queue element (highest priority) 
in CPU queue 20 and to place this value onto microprocessor 
data lines DATA (0-15). Port control 30 then asserts the 
NDTACK line to indicate that the value of the active TCB 
register is available to microprocessor 10. 

It will be recalled that the Event case is the name of the 
event that caused a name of a task to be moved into CPU 
queue 20. Microprocessor 10 can access the Event case 
register to determine the event value within CPU queue 20. 
This access causes the event name to be read from the first 
queue element in CPU queue 20 into the Event case register 
and thence to be placed on microprocessor 10' s data bus 
DATA(0-15). The signal NDTACK is asserted indicating to 
microprocessor 10 that the value of this register is available 
on the output data bus. 

QUEUE STATE MACHINE OPERATION— SCHEDULE 
UTILITY 

The following are states that occur during a Schedule 
utility. Inputs to Queue state machine 16 are in lower case 
and outputs are uppercase. All events are synchronized to the 
system clock. For each state that is not exited until a 
particular event occurs, the event is indicated. If no event is 
specified for a given state, the state is exited upon the next 
state machine clock. 

1 . IDLE — EXEC chip select (cs) is not asserted, ntst is not 
asserted, nrst is not asserted, timer carry out (tco) is not 
asserted. Event — Chip select (cs) is asserted, CPU 
wishes to make an operating system call. 

2. CMD RCVD— an EXEC command is received. 
Actions — The schedule Register latches the value the 
CPU provides on the uP Address lines. This value is the 
TCBName of the task that is to be scheduled for 
execution. 

3. SETUP MATCH CMD— Begin the Schedule utility. 
Setup for a "Matchx" command. Actions — CMD(0-3) 
lines set up with the command "Match>A." D(0-7) 
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lines get the value in the Schedule Register. QEN(0-3) 
is set to the CPUQ value to indicate that this action 
takes place on the CPU Queue. 

4. FINISH MATCH CMD — Assert the "MATCH" line. 
Actions — De-assert last state's actions and assert the 
MATCH line. 

5. READ MATCH>ADD — Read address lines for highest 
priority match>address. Actions — Store the value on 
ADDR(0-7) in a temporary register, and de-assert the 
MATCH line. 

6. SETUP SR CMD — Setup for a shift right command. 
Actions— Put value in temporary register out on 
ADDR(0-7) lines. Set CMD(0-3) lines to "SR" (Shift 
Right) command, and set QEN(0-3) lines equal to 
CPUQ value (to indicate action on CPU Queue) . 

7. FINISH SR CMD — Finish the Shift Right command. 
Actions— De-assert last actions. 

8. SETUP LD CMD— Setup for a "LD" (load) command. 
Actions— CMD(0-3) lines gets the "LDA" command 
value. D(0-7) lines gets the contents of the Schedule 
Register. ADDR(0-7) lines get the value in the tempo- 
rary register. QEN(0-3) lines are set equal to the CPUQ 
value (to indicate action on the CPU Queue). 

8. FINISH LD CMD — Finish the load command. 
Actions— De-assert the actions of the last state. 

10. SETUP END uP XACTION (Setup to end transaction 
with CPU. Actions— DTACK (data transfer acknowl- 
edge) is asserted. Event— cs is de-asserted. 

11. END uP XACTION— End transaction with CPU. 
Actions — DTACK (data transfer acknowledge) is de- 
asserted. 

12. IDLE — GO to state 1 above. 

At this time, the task's TCB has been placed in the CPU 
Queue. In order to use the updated queue information, 
microprocessor 10 must execute a "READ 13 ACnVETCB" 
command. In addition, microprocessor 10 may execute a 
"READ 13 EVENTCASE" in order to determine the event 
that caused this task to resume execution. These actions are 
performed for any utility call including Schedule, Suspend, 
Signal, Wait, and Delay. 
READ t 3 ACHVETCB Command 

1. IDLE— EXEC chip select (cs), ntst, nrst, and tco (timer 
carry out) all are de-asserted. 

2. SETUP RD CMD— Setup for a "RD" (read) command. 
Actions— CMD(0-3) lines get the command "RDA" 
(read word A). The QEN(0-3) lines get the CPUQ 
value. ADDR(0-7) lines are set equal to 0 (for the first 
location in the CPU Queue — the Queue Head). 

3. WRITE ACTIVETCB REG— Write the Active TCB 
Register with the TCBName at the head of the CPU 
Queue. Actions — The Active TCB Register is written 
with the value now appearing on the D(0-7) lines. This 
is the TCBName of the next task to have use of the CPU 
resource by virtue of its priority. 

4. SETUP END uP XACTION— Setup to end transaction 
with CPU. Actions— DTACK (data transfer acknowl- 
edge) is asserted. Event — cs is de-asserted. 

5. END uP XACTION— End transaction with CPU. 
Actions — DTACK (data transfer acknowledge) is de- 
asserted. 

6. IDLE (state 1 above) 
READ_EVENTCASE Command 

1. IDLE — EXEC chip select (cs), ntst, nrst, and tco (timer 
carry out) all are de-asserted. 
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2. SETUP RD CMD-^-Setup a ''RD 1 ' (read) command. 
Actions— CMD(0-3) is set to the "RDB" (read word B) 
value. QEN(0-3) is set to the CPUQ value. 
ADDR(0-7) lines are set to 0, We wish to read word B 
of the first element in the CPU Queue 

3. WRITE EVENTCASE REG — Write the EventCase 
Register with the EventCase value of the next task to 
execute on the CPU. Actions— The EventCase Register 
is set equal to the value of the lines D(0-7). Actions of 
last state are de-asserted. 

4. SETUP END uP XACTION — Setup to end transaction 
with CPU. Actions— DTACK (data transfer acknowl- 
edge) is asserted. Event— cs is de-asserted. 

5. END uP XACTION— End transaction with CPU. 
Actions — DTACK (data transfer acknowledge) is de- 
asserted. 

6. IDLE (state 1 above) 

It should be understood that the foregoing description is 
only illustrative of the invention. Various alternatives and 
modifications can be devised by those skilled in the art 
without departing from the invention. Accordingly, the 
present invention is intended to embrace all such alterna- 
tives, modifications and variances which fall within the 
scope of the appended claims. 

I claim: 

1. A multitasking data processing system including a 
hardware-configured portion of an operating system, the 
combination comprising: 

processor queue means configured in hardware and 
including a plurality of word stores, for storing in 
priority order, task names ready for execution; 

event queue means configured in hardware and including 
a plurality of word stores, for storing task names that 
await an occurrence of an event to be placed in said 
processor queue means; processor means for signalling 
occurrence of an event; and 

match logic means configured in hardware and responsive 
to an asynchronously signalled event from said proces- 
sor means for searching said word stores in parallel in 
said event queue means to find a task name associated 
with said signalled event occurrence, and for transfer- 
ring said task name to said processor queue means. 

2. The multitasking data processing system recited in 
claim 1, further comprising: 

shift logic means for simultaneously transferring a plu- 
rality of task names from one set of word stores to 
another set of word stores in said processor queue 
means. 

3. The multitasking data processing system recited in 
claim 2, wherein said shift logic means includes means for 
simultaneously transferring said plurality of task names 
from word stores in a first direction along said queue or in 
the opposite direction along said queue, in dependence upon 
a generated command signal from said processor means. 

4. The multitasking data processing system as recited in 
claim 3, wherein said shift logic means, in response to an 
address of a said word store, causes said simultaneous 
transfer of task names to commence from said addressed 
word store and to include task names in additional word 
stores along said processor queue, starting from said 
addressed word store. 

5. A multitasking data processing system recited in claim 
1, further comprising: 

hardware event count means, including a plurality of 
registers, each register storing a count of tasks awaiting 
said asynchronously signalled event from said proces- 
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sor means; and 
search means responsive to said asynchronously signalled 
event to search said registers in parallel in said hard- 
ware event count means to determine whether a task is 
awaiting said asynchronously signalled event, and if so, 
operating said match logic means to search said event 
queue means to find a name of a task awaiting said 
asynchronously signalled event. 

6. A multitasking data processing system recited in claim 

5, wherein each said word store in said event queue means 
stores a task name and an associated event name, an occur- 
rence of said named event causing said named task to be 
transferred to said processor queue means. 

7. A multitasking data processing system recited in claim 

6, wherein task names in said event queue means are stored 
in a priority order assigned to each task, and a parallel search 
of said event queue means by said match logic means causes 
a readout of all task names associated with a signalled event, 
in parallel and in priority order. 

8. A multitasking data processing system recited in claim 
5, wherein said search means further determines, for a 
signalled event, if more events have occurred than there are 
tasks awaiting such event's occurrence. 

9. A multitasking data processing system as recited in 
claim 1 further comprising: 

hardware delay queue means including a plurality of word 
stores, each word store storing a delay interval value 
and an event to be signalled by said processor means 
upon occurrence of said delay interval value; 

timer means for signalling delay interval values; 

means responsive to a delay interval value signalled by 
said timer means for searching word stores in said 
delay queue means, in parallel, to find an event to be 
signalled upon a signalling of a delay interval value and 
to transmit said signalled event to said match logic 
means. 

10. A multitasking data processing system as recited in 
claim 1 wherein each said task name is represented by a 
unique value, said value also indicating said named task's 
priority among all tasks. 

11. A hardware-configured operating system kernel, com- 
prising: 

a set of command and status hardware registers for 
receiving both addresses and data from a connected 
data processing system and for passing data to said data 
processing system; 

a CPU queue including a plurality of hardware word 
stores for storing a queue of task names, said queue of 
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task names organized in priority order of said named 
tasks, all said tasks being ready for execution; 
a hardware-configured event queue including a plurality 
of word stores for storing a queue of task names in 
5 priority order, each said task name stored in association 
with an event name upon whose occurrence, said 
associated task name will be ready for execution; and 
a queue state machine responsive to asynchronously 
10 occurring event data in said command and status reg- 
isters, to cause a parallel search of said event queue to 
find all task names associated with said asynchronously 
occurring event data, and to enable a transfer of said 
task names to said CPU queue. 
15 12. The kernel as recited in claim 11, wherein said queue 
state machine causes said task names to be stored in said 
CPU queue in task name priority order. 

13. The kernel as recited in claim 12, wherein each said 
task name is a unique value, said value indicative of said 

20 named task's priority. 

14. The kernel as recited in claim 13, further comprising: 
a delay queue including a plurality of word stores for 

storing a queue of delay interval values and associated 
25 event names, said event names ordered in said delay 
queue by increasing value of associated delay interval 
values, said queue state machine responsive to a delay 
interval value manifestation to cause an associated 
event name to be signalled and task name transfers 
3Q from said event queue to be enabled? 

15. The kernel as recited in claim 14, further comprising: 
address means connected to each word store in each said 

queue for enabling, data to be selectively read from or 
written into any said word store. 

35 16. The kernel as recited in claim 15, further comprising: 
bus means connecting each said word store in each said 
queue for enabling simultaneous data transfers between 
selected word stores. 
17. The kernel as recited in claim 16, further comprising: 

40 a plurality of event registers, one for each of a plurality of 
events, each said event register indicating a count of 
tasks awaiting occurrence of an event, said queue state 
machine responsive to a signalled event to search all 
said event registers in parallel to determine if any tasks 

45 were awaiting occurrence of said signalled event, and 
if so, to cause transfer of a highest priority task name 
associated with said event in said event queue. 

***** 
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[57] ABSTRACT 

An arbitration procedure allowing processes and their asso- 
ciated processors to perform useful work while they have 
pending service requests for access to shared resources 
within a multi-processor system environment. The arbitra- 
tion procedure of the present invention is implemented 
within a multi-processor system (e.g.. a symmetric multi- 
processor system) wherein multiple processes can simulta- 
neously request "locks" which control access to shared 
resources such that access to these shared resources are 
globally synchronized among the many processes. Rather 
than assigning arbitration to the operating system, the 
present invention provides an arbitration procedure that is 
application-specific. This arbitration process provides a res- 
ervation mechanism for contending processes such that any 
given process only requests a lock call to the operating 
system when a lock is available for that process, thereby 
avoiding spinlock by the operating system. During the 
period between a lock request and a lock grant, a respective 
process is allowed to perform other useful work that does not 
need access to the shared resource. Alternatively during this 
period, the processor executing the respective process can 
execute another process that performs useful work that does 
not need the shared resource. Each process requesting a lock 
grant is informed of the expected delay period, placed on a 
reservation queue, and assigned a reservation identifier. 
After releasing the lock, the process uses the reservation 
queue to locate the next pending process to receive the lock. 

21 Claims, 11 Drawing Sheets 
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METHOD OF SEQUENCING LOCK CALL 
REQUESTS TO AN CMS TO AVOID 
SPINLOCK CONTENTION WITHIN A 
MULTI-PROCESSOR ENVIRONMENT 

BACKGROUND OF THE INVENTION 

(1) Field of the Invention 

The present invention relates to the field of computer 
systems. More specifically, the present invention relates to 
the field of arbitration for system resources between pro- 
cessors in a multi-processor environment 

(2) Prior Art 

Multi-processor computer systems are designed with mul- 
tiple processors that share certain resources, e.g., shared 
memory, shared data structures, shared input/output devices, 
etc. One such multi-processor system is the symmetric 
multi-processor system or SMP. In a multi-processor system 
environment, different software implemented processes 
(also called procedures) can be assigned to be executed by 
different processors of the system. To the extent that these 
different processes do not require access to shared resources, 
a significant amount of parallel processing can be performed 
by the multi-processor system, and the addition of more 
processors may increase overall data throughput Efficient or 
effective use of parallel processing greatly decreases the 
amount of processing time required to perform certain tasks, 
for instance, tasks performed by a relational database man- 
agement system (RDBMS). Therefore, many RDBMS sys- 
tems employ multi-processor systems to efficiently perform 
database tasks. 

System scaling refers to increasing system performance 
and processing efficiency by the addition of more processors 
within a multi-processor system. However, there are cases in 
which the addition of more processors within a multi- 
processor system can actually decrease overall system per- 
formance and processing efficiency. In one such case, com- 
petition or "contention"* between processors for the same 
shared resources that are available within the multi- 
processor system renders some processors inactive. 
Typically, the contention between processors in the multi- 
processor system is represented as contention between dif- 
ferent processes running on these processors. Therefore, 
contention is generally expressed as being process specific 
or as existing between processes, but nevertheless has enor- 
mous ramifications for the processors themselves. 

For instance, when shared system resources are needed by 
a number of processes at the same time, prior art contention 
handling routines that are built into an operating system 
(CVS) determine which process is first allowed access to the 
shared resource. The other processes that request access, but 
are not selected by the operating system, are then queued by 
the operating system, theoretically providing "fairness" for 
the resources since they are "shared." The operating system 
is typically informed of a process* need for shared resource 
by receiving a lock call from the process. This lock call 
specifies that a "lock" is desired over a particular shared 
resource by the process. The "lock" is an authorization to use 
the shared resource and is given to the process with the 
highest contention priority. In this fashion, global access 
synchronization is maintained between the various pro- 
cesses over the shared resource. 

However, within prior art lock contention arbitration (or 
spinlock contention arbitration) procedures that are built into 
operation systems, those processes that execute a lock call, 
but are denied access to the lock, are forced into a spin 
routine (spinlock) whereby their associated processors are 
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held up during a waiting period for the lock (also called 
"busy" period). During this waiting period, no useful work 
can be performed by the processor (e.g.. the processor is 
prevented from executing the process, or any processes at 

5 all, nor can the processor receive or service an interrupt). 
Generally, the only function the processor performs during 
this waiting period is to periodically check if the lock has 
been granted to it by the operating system. During spinlock, 
the processor is not even informed of the expected length of 

10 the delay period until lock grant is to take place. In these 
prior art operating systems, very high levels of spinlock 
contention, e.g.. on the order of multiple processors of 
compute cycles, may occur for specific application work 
loads and processor configurations. 

15 The above is particularly true for asynchronous resource 
requests where the prior art spinlock contention service 
routines spin creating a bottleneck which reduces SMP 
scaling as the processes cannot continue working. As shown 
below, the practice of placing multiple processors in spin- 

20 lock while waiting for their shared resources dramatically 
reduces the overall processing efficiency of any multi- 
processor system. 

In FIG. 1. a time snap-shot is shown (e.g., t=x) of an 
exemplary prior art queue 10 implemented by an operating 

25 system, with respect to a multi-processor system having 
eight processors (referred to as CPU 1 through CPU 8). At 
this time, process 2 of CPU2 currently is granted a lock 
corresponding to an arbitrary shared resource. Within queue 
10, slots 21-27 are held by processes 5, 10, 7, 9, 3. 11. and 

30 18 of CPUs 1, 4, 8, 3, 7, 6, and 5, respectively. Processors 
in queue 10 are held in spinlock waiting for the same lock. 
Priority ordering for access to this lock is represented by the 
processes' positions (1-7) within queue 10. In this state, no 
processors (aside from CPU 2) are performing useful work 

35 because the operating system has each process in a spinlock 
whereby they are merely waiting for their next turn to access 
the shared resource lock. Only CPU 2 is allowed to perform 
useful work. In mis inefficient model, the addition of more 
processors to the multi-processor system will not signifi- 

40 cantly increase processing efficiency of the overall system. 
This is the case because it is more than likely that the 
additional processors will be tied up in spinlock over vital 
shared resources rather than performing useful work. 
Therefore, spinlock contention between multiple processors 

45 over shared resources is a serious constraint on the degree of 
system scaling obtainable within a multi-processor system. 
Another drawback of this approach is that the size of the 
queue 10 (slots 21-27) is directly related to the number of 
processors in the multi-processor system because each pro- 

50 cesser is held in spinlock while in the queue and there are 
only 8 processors. 

To date, the practiced solution of the application designer 
has been to carefully design an application to insure minimal 
use of system resources that require spinlock 

55 synchronization, and subsequently accept the resulting 
multi-processor scaling achieved by the application, how- 
ever low that may be. This solution is unacceptable for at 
least two reasons. First, this approach forces application 
designers to tolerate low degrees of multi-processor scaling. 

60 Second, in many cases it is difficult to predict which system 
resources will create bad spinlock contention during execu- 
tion because the application designer does not know, a priori. 
(1) over which processor a typical process will execute in a 
multi-processor system nor (2) which processes the other 

65 processors are executing at any given time. As such, it is 
very difficult to accurately design an application to insure 
minimal use of system resources that require spinlock syn- 
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chroaization when the designer does not know the precise 
tuning of such demands and which processors will be 
making them. 

Accordingly, it would be advantageous to provide a 
multi-processor system that avoids the spinlock contention 
problems described above. Further, it would be advanta- 
geous to provide a multi-processor system that allows pro- 
cesses and processors that are contending for shared 
resources to perform useful work while their resource 
requests are pending. The present invention provides such a 
multi-processor system which uses an advantageous lock 
contention arbitration procedure that avoids the above dis- 
cussed shortcomings. 

SUMMARY OF THE INVENTION 

An arbitration procedure is described allowing processes 
and their associated processors to perform useful work while 
they have pending service requests for access to shared 
resources within a multi-processor system environment Hie 
arbitration procedure of the present invention is imple- 
mented within a multi-processor system (e.g., a symmetric 
multi-processor system) wherein multiple processes can 
simultaneously request access to "locks** which synchronize 
access to shared resources such that access to these shared 
resources are globally synchronized among the many pro- 
cesses. Rather man assigning arbitration to the operating 
system, the present invention provides an arbitration proce- 
dure that is application specific allowing an application to 
retain control of its processor while waiting for a lock grant 
The present invention sits on top of the operating system and 
requests locks of the operating system only when they are 
known to be available. 

This arbitration process provides a reservation mecha- 
nism for contending processes such that any given process 
only requests a lock call to the operating system when a lock 
is available for that process, thereby avoiding spinlock 
contention by the operating system. From the operating 
system's perspective, all service requests arrive sequentially 
and therefore obviate the operating system* s spinlock con- 
tention resolution mechanisms. According to the present 
invention, during the period between a lock request and a 
lock grant for a shared resource, a respective process is 
allowed to perform other useful work that does not need 
access to the shared resource. Also during this period, the 
processor executing a respective process can suspend the 
respective process and execute another process that per- 
forms useful work that does not need the share resource. 
Each process requesting a lock grant is informed of the 
expected delay period, placed on a reservation queue, and 
assigned a reservation identifier. After releasing the lock, the 
process uses the reservation queue to locate the next pending 
process to receive the lock. 

More specifically, embodiments of the present invention 
include a method in a multi-processor system having a group 
of processors coupled to a bus. a shared computer readable 
memory unit coupled to the bus, and a shared resource with 
an associated lock used for resource access synchronization, 
the method of queuing processes for access to the lock 
including the steps of: executing a plurality of processes by 
the group of processors; intercepting a first process of the 
plurality of processes requesting access to the lock of the 
shared resource; issuing to the first process a reservation 
identifier indicating a queue position of the first process with 
respect to processes stored in a reservation queue of the 
memory unit; storing an identifier of the first process and the 
reservation identifier into a entry of the reservation queue; 
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enabling the processes in the reservation queue, including 
the first process, to perform useful tasks while pending in the 
reservation queue; and upon a second process relinquishing 
the lock, using the reservation queue to identify a third 
5 process in the reservation queue and issuing an operating 
system service request to grant the third process access to the 
lock. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 FIG. 1 is a logical diagram of spinlocked processes in a 
queue of a prior art operating system implemented arbitra- 
tion system. 

FIG. 2A is a logical block diagram of a general purpose 
l5 multi-processor system environment operable for the 
present invention arbitration procedure. 

FIG. 2B is an exemplary multi-processor system using a 
symmetric multi-processor (SMP) architecture. 
FIG. 2C is a high level block diagram of processes and a 
20 spin lock queue of the present invention within the appli- 
cation space of a computer system and also illustrates die 
operating system space. 

FIG. 3 illustrates elements stored in computer readable 
memory used by the arbitration procedure of the present 
25 invention. 

FIG. 4 is a flow diagram of steps performed by the present 
invention when a process of the multi-processing system 
originates a lock call and enters the enter gate procedure. 
33 FIG. 5 is a flow diagram of steps performed by the present 
invention when a process enters the exit gate procedure. 

FIG. 6A and FIG. 6B represent a flow diagram of steps of 
the shared resource lock request arbitration procedure of the 
present invention. 
35 FIG. 7 is a flow diagram of steps performed by the present 
invention when a process releases a lock. 

FIG. 8 is a logical diagram of queued processes that are 
allowed to perform useful work within the arbitration pro- 
cedure of the present invention. 

40 

DETAILED DESCRIPTION OF THE 
INVENTION 

In the following detailed description of the present 
45 invention, a computer controlled system and method for 
performing access arbitration between multiple processes of 
a multi-processor system whereby queued processes are 
allowed to perform useful work, numerous specific details 
are set forth in order to provide a thorough understanding of 
50 the present invention. However, it will be obvious to one 
skilled in the art that the present invention may be practiced 
without these specific details. In other instances well known 
methods, procedures, components, and circuits have not 
been described in detail as not to unnecessarily obscure 
55 aspects of the present invention. 

Notation and Nomenclature 

Some portions of the detailed descriptions which follow 
are presented in terms of procedures, logic blocks, 

60 processing, and other symbolic representations of operations 
on data bits within a computer memory. These descriptions 
and representations are the means used by those skilled in 
the data processing arts to most effectively convey the 
substance of their work to others skilled in the art A 

65 procedure, logic block, process, step, etc., is here, and 
generally, conceived to be a self-consistent sequence of steps 
or instructions leading to a desired result The steps are those 



09/03/2003, EAST Version: 1.04.0000 



5,790.851 

5 6 

requiring physical manipulations of physical quantities. shared resources. It is appreciated that the SMP configura- 

Usually, though not necessarily, these quantities take the tion is exemplary only and that the lock arbitration proce- 

form of electrical or magnetic signals capable of being dure of the present invention is well suited for application to 

stored* transferred, combined, compared, and otherwise a variety of multi-processor environments. In the SMP 

manipulated in a computer system. It has proven convenient 5 configuration 110. eight processors 101(1)-101(8) are 

at times, principally for reasons of common usage, to refer shown. A portion of their shared (RAM) memory 102o is 

to these signals as bits, values, elements, symbols, also shown. Shared memory 102a contains data structures 

characters, terms, numbers, or the like. that are shared between the processors 101(1)-101(8). An 

It should be borne in mind, however, that all of these and input/output (I/O) shared resource 104a, e.g„ a disk drive, is 
similar terms are to be associated with the appropriate « also shown. A third exemplary shared resource 109 is also 
physical quantities and are merely convenient labels applied shown. Any of the shared resources 102a, 104a .and 1 109 can 
to these quantities. Unless specifically stated otherwise as be accessed by any processor of processors 101(1)-101(8) 
apparent from the following discussions, it is appreciated after its associated access request is synchronized by obtain- 
that throughout the present invention, discussions utilizing ing a lock. It is appreciated that in the general case, access 
terms such as 'Wessing" or "computing" or "calculating" « to the lock is performed for synchronization purposes only 
or "determining" or "displaying" or the like, refer to the and is not thereafter required for further use of the shared 
action and processes of a computer system (e.g., 112 of FIG. resource by the process. Therefore, the period of time that a 
2A). or similar electronic computing device, that manipu- particular process holds the associated lock is a predeter- 
lates and transforms data represented as physical (electronic) ^ and ^dily available value, 
quantities within the computer system's registers and memo- 20 Within the SMP configuration 110 of FIG. 2B. each 
ries into other data similarly represented as physical quan- processor of processors 101(1)-101(8) is capable of simul- 
tities within the computer system memories or registers or taneously executing a respective process. A scheduler within 
other such information storage, transmission or display the operating system, which is one of the executed 
devices. processes, determines which processor will execute which 

25 process and when in accordance with well know techniques. 

System Environment In order to avoid the spinlock contention between processors 

_ A . i-w/~, «< . . . ^ 101(1>-101(8), the present invention implements the arbi- 

Refer to FIG. 2A and FIG. 2B which tration mechanism that is described below, 
nents of a multi-processor computer system 112. Within the 

following discussions of the present invention, certain pro- ^ Arbitration Mechanism of the Present 

cesses (e.g.. processes 400, 500, 600. and 700) and steps arc Invention 
discussed that are realized, in one embodiment, as a series of 

instructions (e.g.. software program) that reside within com- It is appreciated that the arbitration mechanism 52 of the 

puter readable memory units of system 112 and executed by present invention sits on top of the operating system 60 and 

processors of system 112, When executed, the instructions 35 is application specific. As such, this arbitration mechanism 

cause the computer system 112 to perform specific actions 52 resides on the application side 55 of system 112 and is not 

and exhibit specific behavior which is described in detail to part of the operating system code. In this way. processors 

follow. control their own process while pending for a shared 

In general, computer system 112 of FIG. 2A used by the resource lock and are not placed into spinlock contention, 

present invention comprises an address/data bus 100 for 40 ^ arbitration mechanism 52, as further described below, 

communicating information, multiple processors in proces- calls the operating system 60 at determined points when a 

sor group 101 which are coupled with the bus 100 for lock is known to be available for a given process, 

processing information and instructions, a shared computer Specifically. FIG. 2C illustrates a high level block dia- 

readable volatile memory unit 102 (e.g., random access gram of the environment of the present invention. Proces- 

memory. static RAM, dynamic, RAM, etc.) coupled with the 45 sors 101(1)-101(8) can be executing multiple processes, 

bus 100 for storing information and instructions for the e.g., processes 1-n (referenced 41=45), simultaneously 

processors 101, a shared computer readable non-volatile where n is equal to or less than the number of processors 

memory unit (e.g.. read only memory, programmable ROM, within 101(1)^-101(8). Process arbitration mechanisms 52 of 

flash memory, EPROM, EEPROM, etc.) coupled with the the present invention (as well as the processes 1-n 

bus 100 for storing static information and instructions for the 50 themselves) reside within the application space 55 of system 

processors) 101. System 112 also includes a shared mass 112. The other space of system 112 being the operating 

storage computer readable data storage device 104 (hard system space 60. 

drive or floppy) such as a magnetic or optical disk and disk Rather than allowing each process 1-n to call the oper- 

drive coupled with the bus 100 for storing information and ating system 60 directly to obtain a lock, the present 

instructions. Optionally, system 112 can include a display 55 invention contains process arbitration rnechanisms 52 which 

device 105 coupled to the bus 100 for displaying informa- intercept the calls via interfaces 41a-45a and maintain a 

tion to the computer user, an alphanumeric input device 106 queue 220 situated within the application side 55. Each 

including alphanumeric and function keys coupled to the bus calling process of 1-n that does not immediately get a lock 

100 for communicating information and command selec- granted, is informed (via interfaces 41<a-45) of its place in 

tions to the processors 101, a cursor control device 107 go the queue 220 situated within 52. During the time while a 

coupled to the bus for communicating user input information process of 1-n is waiting in the queue 220 of the present 

and command selections to the processors 101, and a signal invention, it can perform useful processing tasks. The arbi- 

generating device 108 coupled to the bus 100 for commu- tration mechanisms 52 of the present invention then inter- 

nicating command selections to the processors 101. face directly with the operating system 60 and request locks 

FIG. 2B illustrates a logical block diagram 110 of the 65 from the operating system 60 via interface 60a using well 

multi processors 101 in more detail in a symmetric multi- known lock requesting and granting mechanisms. Since 

processor (SMP) configuration in conjunction with certain queuing is performed by 52, the operating system 60 is only 
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requested for a lock when one is available. By performing way in which lock calls are detected at step 410 is by 

process queuing in the application side 55. the present searching for $ENQ locking service requests ($ENQ lock 

invention advantageously eliminates spinlock contention calls) which are issued by processes operating within the 

created by the operating system 60 within processes 1-n and DEC OpenVMS operating system 60. If a process is 

therefore within processors 101(1H01(8). 5 ^tected at step 410 requesting a shared resource lock, then 

™. * in . .... , . . . at step 420, the normal operating system lock arbitration 

FIG. 3 illustrates components within the lock arbitration mechanism h ^ and 600 is executed (FIG. 6A 

mechanism of the present invention that are maintained and FIG. 6B). Process 400 then returns to step 410. The enter 

within a memory portion 102ft of memory unit 102 (FIG. gate proems 600 is described in more detail below and 

2A) and accessible by all processors 101<1)-101(8). The i nvo lves the placement of the calling process into the 

present invention maintains a reservation queue 220 stored "> reservation qucue 220 provided the shared resource lock is 

in memory ("a memory queue") for each lock of a shared busy ^ ^ way ^ shared resource lock arbitration 

resource. For exemplary purposes, the following discussion mechanism 0 f me present invention supplants the operating 

describes arbitration with respect to a data structure lock that system > s 5^ ^ arbitration. 

is required to perform I/O operations. However, it is aPf* c ' na 5 iii ustrate s a flow diagram of steps performed 

dated that the present invention is equally well suited for w ^ ^ t fof a ^ 

application to any lock of any shared resource within a exit gate routiDe 700 which handles the case when a process 

multi-processor environment 112. is finished with a share4 resource lock and is ready to release 

The reservation queue 220 of FIG. 3 contains a number of it to process stored within the reservation queue 220. 

entries 220<0)-220(n) that each relate to a respective queued Process 500 is implemented as application program instruc- 

process and each store two items: (1) a process identifier for ti ons stored in computer readable memory units of system 

the queued process, PROC_JD. stored in column 224; and i 12 and executed by a processor of system 112. At step 510, 

(2) a reservation identifier, RES J), assigned to the queued process 500 commences when a process first receives its 

process by the present invention and stored in column 226. requested shared resource lock. At step 515, the present 

Each time a process requests a lock call, the name or invention checks if the respective process is finished with 

'Identifier" of the process is stored in column 224 of a the shared resource lock and ready to release the lock If not 

vacant entry of reservation queue 220 and a reservation step 515 m $\ me process is finished with the lock at 

identifier is obtained from a latest pending service counter whicb ^ cxit gate routine 7#0 (FIG. 7) is entered. Process 

210 and also stored in the vacant entry in column 226. The 509 men returns. 

latest pending service counter 210 is then incremented. mG 6A aQd nQ 6B mustr ate steps within the enter gate 

Therefore, reservation queue 220 contains a listing 1 of all routme ^ of ±Q nt mvciltioj ,, Routinc 600 is imple- 

processes and their reservation identifiers that have indicated mcntcd ^ application program instructions stored in com- 

that they require access to the lock but have not yet been ^ memory units of system 112 and executed by 

granted the lock. a processor of system 112. Routine 600 is an application- 

The latest pending service counter (with a latest pending 35 specific process and operates apart from the operation sys- 

service register) 210 contains the value of the latest reser- tern of system 112. Process 600 avoids operating system 

vation identifier assigned to a process stored in the memory spinlock contention by ordering processes* operating service 

queue 220. The reservation identifiers are sequential and requests using the reservation queue 220. In accordance with 

indicate the place in line assigned to a particular process the present invention, an operating system service request 

within reservation queue 220. The reservation ordering ^ for a shared resource lock is not executed by a process until 

maintained within the reservation queue 220 is a first come, the lock is in effect ready to be granted to the process by the 

first serviced mechanism, e.g., also known as first in, first out operating system 60. 

(HhO). Process 600 is invoked in response to a process requesting 

FIG. 3 also contains an in-service counter and register a shared resource lock as detected within FIG. 4. Throughout 

215. The in-service counter value 215 contains the reserva- 45 the discussion of process 600. this invoking process is called 

tion identifier for the process that currently has access to the the current process. At step 610 of FIG. 6A. the count in the 

lock. Arrow 230 indicates that the in-service counter 215 is latest pending service counter 210 (FIG. 3) is incremented 

updated based on entries within reservation queue 220 by one in response to the current process requesting the lock, 

leaving to access the lock. Arrow 240 indicates that pro- Also at step 610, the present invention receives a process 

cesses with the latest pending service count are added to the 50 identifier (or name) of the current process. At step 615. the 

vacant entries of reservation queue 220 and are pending present invention then issues a reservation identifier to the 

access to the lock. A process can determine the relative current process. This reservation identifier is the current 

length of the reservation queue, e.g., its position in line, by count in the latest pending service counter 210 (after the 

comparing the latest pending service count it receives from increment of step 610). At step 620, the present invention 

the latest pending service counter 210 to the count value in 55 identifies a vacant entry within reservation queue 220 and in 

the in-service counter 215. that entry stores: (1) the process identifier of the current 

FIG. 4 illustrates a flow diagram of steps of process 400 process; and (2) the reservation identifier assigned to the 

performed within the present invention for placing a process current process at step 620. At the completion of step 620, 

of system 112 into the enter gate routine 600 to handle a lock the current process is assigned a place in line within reser- 

call service request made by the process. Process 400 is 60 vation queue 220. The place in line is determined by the 

implemented as application program instructions stored in reservation identifier as signed to the current process, 

computer readable memory units of system 112 and At step 625. the present invention checks if the shared 

executed by a processor of system 112. At step 410, the resource lock requested by the current process is already 

present invention monitors all call requests in system 112 to granted to another process (gate busy) or if it is available. If 

determine if any process executed within processors 101(1) 65 the lock is already granted; then step 625 functions as a lock 

-101(8) requests access to a shared resource lock. If not, step gate, and step 640 (FIG. 6B) is entered. At step 625. if the 

410 loops back to check again periodically. One exemplary shared resource lock is available for use. then step 630 of 
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FIG. 6A is entered. Al step 630. the present invention stores resource lock. Of Ihe three different actions possible for the 

the reservation identifier of the current process into the current process, this first action is (he only action that does 

in-service counter to represent that the current process is not allow the current process, or its processor, to perform 

granted the lock. any useful work while waiting for the lock grant This is the 

At step 635 of FIG. 6A. process 600 returns by issuing a 5 case because the expected waiting time is generally set to an 

standard shared resource lock call to the operating system 60 amount too short to perform any useful work. 

and the operating system 60 then grants the shared resource & °"« embodiment the small predetermined amount of 

lock to the current process. At step 635. an operating system time is that amount of time a process requires to hold tie 

call for the shared resource lock is only made with the lock ^fae releasing it. In another embodiment, the short 

advance knowledge that the lock is in fact free to be granted w duration can be adaptively adjusted such that it is set jurt 

to the current process by the operating system 60. In this smaller than the smallest time required to. perform any ^useful 

way. operating system spinlock contention is completely task within a task queue (described further below). At step 

avoided! the processor executing the current process is «5- * e P™*» 600 "W™ ^ "suing an operaung system 

never placed into spinlock by the operating system 60. service lock call to grant the current process the lock. 

, „„ ,„ . ... . is The second possible action taken by the current process is 

"J? ? ? T' 6B ' **?T* m ST£T « determined 650, where if the BqMcdL*^ 

mined that the shared resource lock requested by the ouren ^ h . duration <e . g ., ^ 

process is not available. In these cases the ™t prc^s ^ or mor e jesses from the next in line) 

mustwartfor^ ^ ^ m b entered ^ 67(K mc process 

work while waiting in accordance with the Fesentinven- T period of timed sleeji that corresponds to the 

Uon. At step 640 me present invention an ^ under fte Expe< ^_waitin^_Tirn C . During 

esUmated or exp^edwaibng time until the current rrocess J that iTexecuting the current process 

will be granted me IccL This expected waiting timeis based ^ mc ^ t ^ „ mcn allowed 

on the difference between the reservation identifier of the £ ^ £ o nefid ^ ^ 

current process and I the current count xn the m-semce can perform other useful work involving 

counter 215. This difference, M, yields the number of ^n^tt-only tasks that do not require oper 

processes that are pending wrfun the reservation queue 220 scr Xs. such as formatting data packets, 

ahead of the current process for thus shared resource lock. ^ moying ^ * 

As discussed above, the shared resource lock is needed by ^ originate f Tom ^ qucuc (described 

a process only during a short synchronization phase, which ^ It ^ appreciated that at the completion of the timed 

synchronizes access to the resource, and is not thereafter sleep of step 67(K ^ ^^0! invention men re-enters 

needed while the process then accesses the shared resource. step 640 and is thereafter generally placed into step 660, 

For instance, a shared resource lock to an I/O data structure Therefore, at step 670, the present invention allows proces- 

(in memory) 102a is required only to reserve a portion of sors associated with processes within the memory queue 220 

memory for the process and is thereafter not needed by the 35 t0 perform useful wor k by executing other processes rather 

process when rilling the portion of memory with data during man forced ^ spinlock modes. This increases over- 

the bulk of the I/O operation. Synchronization is required so ^ system efficiency. 

that two or more processes do not request the same memory ^ ^ possible action taken by the current process is 

portion at the same time. Therefore, the time each process determined at step 650. At step 650, if the Expected_ 

needs the lock is a constant and known period of time and ^ Waiting_Tirne is not too long, then it is of an intermediate 

does not depend on the amount of time the process requires ^ ^ ^ ^ 5 is entere<L At step 655. the present 

to perform its I/O operation (which can very widely). If the i nve ntion allows the current process to execute other useful 

period of time that a certain process needs a particular lock ^ ^ te completed given the duration of the 

is expressed as T, and the number of processes ahead of the r^pected_Waiting_Time. These other useful tasks can 

current process is M, then the expected waiting time as 45 i3w6tvt ^.bounded compute-only tasks that do not 

determined at step 640 can be expressed as: rcquirft operating system services, such as fonnatting data 

packets, calculating checksums, and moving data in memory 

Eipec^Wuit^Tm^M^ 102 Therefore, at step 655, the present invention allows 

processes within the reservation queue 220 to perform useful 

Step 640 of FIG. 6B, after computing the Expected_ 50 work rather than being forced into spinlock modes. This, like 

Waiting_Time, returns this value to the current process. step 670, increases overall system efficiency. 

Depending on the length of the Expected_Waiting_ In one embodiment, the present invention maintains a 

Time, the current process can perform one of three different global list of generic work items that can be performed 

actions to adaptively maximize the processor cycles avail- without operating system service requests. These generic 

able within system 112. 55 work items or tasks are maintained in the task queue 

First, as indicated by step 645, if the Expected_Waiting_ discussed above. The task queue maintains a listing of task 

Time is of a relatively short predetermined duration (e.g., the names and the expected processing time required for each to 

current process is next in line), then the present invention complete. As processes become available to perform useful 

will delay for some small predeterrnined amount of time at work at step 655. they can index this task queue and select 

step 660. This small predeterrnined amount of time is 60 and execute as many tasks as possible given their 

adjusted such that it is relatively close, but shorter, than the Expected_Wailingjrime. Furthermore, a processor within 

expected waiting time. Subsequently, after step 660, step step 670 can also access this task queue to select tasks to 

680 is entered whereupon the current process sits idle for a execute. 

brief period until an asynchronous interrupt places the At the completion of step 655, the current process enters 

current process into step 685. The activation of the synchro- 65 step 680 where it is momentarily forced into an idle state 

nous interrupt is described with reference to FIG. 7 and until it receives an asynchronous interrupt which forces 

originates from another process releasing the shared entry to step 685. This asynchronous interrupt originates 
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from the exit gate routine 706 of FIG. 7 and informs the 
current process that it is first in line to receive the lock. At 
this point, processing flows to step 685 of FIG. €B where the 
cuirent process is granted the lock. As described below, the 
process originating the asynchronous interrupt updates the 5 
in-service counter before step 685 is invoked. In some cases, 
a process within step 655 can be interrupted by the asyn- 
chronous interrupt In this case, step 685 is entered. 

In an alternative embodiment, in lieu of an asynchronous 
interrupt all processes within the reservation queue 220 that to 
are held in step 680 compare their reservation identifiers to 
the current value within the in-service counter 215. When 
the in-service counter 215 matches a particular process* 
reservation identifier, that process then obtains the lock via 
step 685. 15 

It is appreciated that process 600 is shown with respect to 
a current process, however, multiple processes are handled 
by process 600. In effect all processes stored in the reser- 
vation queue 220 can exist simultaneously within a state 
governed by a step of steps 660, 670, 655 or 680. Therefore, 20 
all processes within the reservation queue 220 can be within 
process step 655 performing useful work while waiting for 
a lock grant 

FIG. 7 illustrates steps of the exit gate routine 700 of the 
present invention. Routine 700 is implemented as applica- 25 
tion program instructions stored in computer readable 
memory units of system 112 and executed by a processor of 
system 112. Routine 700 is followed by any process that is 
relinquishing a shared resource lock. At step 710, the present 
invention receives the reservation identifier of the process 30 
that is returning the lock. At step 715, the present invention 
then compares this reservation identifier with all entries in 
column 226 of the reservation queue 220 to locate the entry 
within the reservation queue 220 corresponding to this 
returned reservation identifier. This located entry of the 35 
reservation queue 220 is then made vacant. At step 720, the 
in-service counter 215 is then incremented by one. 

At step 725 of FIG. 7, the present invention then compares 
all entries in column 226 of the reservation queue 220 with 
the value of the in-service counter 215. The entry matching 40 
the in-service counter 215 is then determined to be the 
process in the front of the line and next to receive the lock 
grant At step 725, the present invention determines the 
process identifier from this matching entry. At step 730, the 
present invention generates an asynchronous interrupt to the 45 
process having the process identifier as determined from 
step 725. This gives the process an indication that it can go 
ahead and the take the lock via an operating system call. The 
exit gate routine 700 then exits. In the alternative embodi- 
ment discussed with respect to FIG. 6B that does not use the 50 
asynchronous interrupt step 730 is not required since each 
process in the reservation queue 220 is self-monitoring for 
the lock release. 

An advantage of the shared resource lock arbitration 
mechanism of the present invention is that no modifications 55 
need be made on the operation system* s lock call proce- 
dures. The present invention ensures that when the operation 
system's lock call procedures are executed, no contention 
will exist between processes for the share resource lock. 

FIG. 8 illustrates an exemplary time snap shot of several 60 
processes pending in the reservation queue 220 for the 
shared resource lock Queue positions 810(l)-810(n) cor- 
respond to processes stored within respective entries of 
reservation queue entries 220(1) to 220(n). Gate queue 810 
represents the process possessing lock. In contrast to the 65 
queue 10 of FIG. 1, since processors 110(1)- 110(8) are not 
placed into spinlock pending their lock grant, many more 
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processes can be placed in the queue 220 in accordance with 
the present invention. In this case, processes 1 ... n are 
queued where n can be very much larger than 8. 

More importantly, while pending, the processors 110(1) 
-110(8) are allowed to perform useful work within system 
112. For instance, process 10 executing over GPU 4 (e.g., 
processor 110(4)), is shown in queue position 810(2). While 
pending for its lock grant, process 10 is free to perform 
useful work or processor 110(4) is free to execute another 
process that can perform useful work. Similarly, process 9 
executing over CPU 3 (e.g.. processor 110(3)), is shown in 
queue position 810(4). While pending for its lock grant 
process 9 is tree to perform useful work or processor 110(3) 
is tree to execute another process that can perform useful 
work. The same is true with respect to the other queue 
positions of 810(l)-810(n) as shown. 

One embodiment of the present invention has been imple- 
mented on a DEC OpenVMS using its $ENQ locking 
services as an example. This locking service is used very 
frequently by the TPC-C benchmark application (in excess 
of 25 K operations per second) and most other business 
application programs. Without the present invention, an 
8-processor Turbolaser computer system wastes an equiva- 
lent of 1.5 processors in spinlock mode for the lock service 
spinlock. With present invention shared resource lock arbi- 
tration mechanism, this waste is reduced to almost zero. The 
overhead of the present invention is one enter and exit call 
pair that jacket each of the common $ENQ and the $DEQ 
service calls. This overhead translates to a waste of less than 
one quarter of one processor. 

The preferred embodiment of the present invention, a lock 
arbitration mechanism for a multi-processor system, is thus 
described. While the present invention has been described in 
particular embodiments, it should be appreciated that the 
present invention should not be construed as limited by such 
embodiments, but rather construed according to the below 
claims. 

What is claimed is: 

1. In a multi-processor system having a group of proces- 
sors coupled to a bus, a shared computer readable memory 
unit coupled to said bus. and a shared resource with an 
associated lock used for resource access synchronization, a 
method of queuing processes for access to said lock com- 
prising the steps of: 

a) executing a plurality of processes by said group of 
processors; 

b) intercepting a first process of said plurality of processes 
requesting access to said lock of said shared resource; 

c) issuing to said first process a reservation identifier 
indicating a queue position of said first process with 
respect to processes stored in a reservation queue of 
said memory unit; 

d) storing an identifier of said first process and said 
reservation identifier into a entry of said reservation 
queue; 

e) enabling said processes in said reservation queue, 
including said first process, to perform useful tasks 
while pending in said reservation queue; and 

0 upon a second process relinquishing said lock, using 
said reservation queue to identify a third process in said 
reservation queue and issuing an operating system 
service request to grant said third process access to said 
lock. 

2. A method as described in claim 1 wherein said step b) 
comprises the steps of: 

monitoring said multi-processor system for $ENQ lock 
calls originating from said plurality of processes; and 
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intercepting a $ENQ lock call originating from said first 
process. 

3. A method as described in claim 1 wherein said step c) 
comprises the steps of: 

upon said first process requesting access to said lock of 5 
said shared resource, incrementing a latest pending 
service counter value maintained in said memory unit; 

accessing said latest pending service counter value to 
obtain said reservation identifier; and 

issuing said reservation identifier to said first process. 10 

4. A method as described in claim 1 wherein said step e) 
comprises the step of enabling said first process to perform 
a useful compute-only task that does not require operating 
system services, such as: formatting data packets; calculat- 
ing checksums ; and moving data in said memory unit while is 
said first process is pending in said reservation queue. 

5. A method as described in claim 4 further comprising the 
steps of: 

determining an expected waiting time before said first 
process is to obtain said lock; 20 

accessing a task queue to obtain a respective useful task 
that can be completed within said expected waiting 
time; and 

said first process executing said respective useful task 
while said first process is pending in said reservation 23 
queue for said lock. 

6. A method as described in claim 1 wherein said step e) 
comprises the step of placing said first process to sleep and 
enabling a processor executing said first process to execute 

a fourth process that performs a useful compute-only task 30 
that does not require operating system services, such as: 
formatting data packets; calculating checksums; and moving 
data in said memory unit while said first process is pending 
in said reservation queue. 

7. A method as described in claim 1 where said step f) 35 
comprises the steps of: 

upon said second process relinquishing said lock, incre- 
menting an in-service counter to indicate a next reser- 
vation identifier; 

40 

searching said reservation queue to locate said third 
process having said next reservation identifier; and 

generating an asynchronous interrupt to said third process 
causing said operating system to grant said third pro- 
cess said lock 

8. A multi-processor system comprising a group of 
processors, a shared computer readable memory unit and a 
shared resource with an associated lock used for resource 
access synchronization between a plurality of processes, 
said memory unit containing instructions that when executed w 
implement a method of queuing processes for access to said 
lock, said method comprising the steps of: 

a) executing said plurality of processes by said group of 
processors; 

b) intercepting a first and a second process, of said 55 
plurality of processes, requesting access to said lock of 
said shared resource; 

c) issuing to said first process a first reservation identifier 
that indicates a queue position of said first process with 
respect to processes stored in a reservation queue of 60 
said memory unit and issuing to said second process a 
second reservation identifier; 

d) storing an identifier of said first process and said first 
reservation identifier into a first entry of said reserva- 
tion queue and storing an identifier of said second 65 
process and said second reservation identifier into a 
second entry of said reservation queue; 
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e) enabling said first process and said second process to 
perform useful tasks while pending in said reservation 
queue; and 

f) upon a third process relinquishing said lock, using said 
reservation queue to identify said first process in said 
reservation queue and issuing an operating system 
service request to grant said first process access to said 
lock. 

9. A multi-processor system as described in claim 8 
wherein said step b) comprises the steps of: 

monitoring said multi-processor system for $ENQ lock 

calls from said plurality of processes; 
intercepting a first $ENQ lock call originally from said 

first process; and 
intercepting a second $ENQ lock call originally from said 

second process. 

10. A multi-processor system as described in claim 8 
wherein said step c) comprises the steps of: 

upon said first process requesting access to said lock of 
said shared resource, incrementing a latest pending 
service counter value maintained in said memory unit; 

accessing said latest pending service counter to obtain 
said first reservation identifier; 

issuing said first reservation identifier to said first process; 

upon said second process requesting access to said lock of 
said shared resource, incrementing said latest pending 
service counter value maintained in said memory unit; 

accessing said latest pending service counter to Obtain 
said second reservation identifier, and 

issuing said second reservation identifier to said first 
process. 

11. A multi-processor system as described in claim 8 
wherein said step e) comprises the step of enabling said first 
process to perform a first useful compute-only task that does 
not require operating system services, such as: formatting 
data packets; calculating checksums; and moving data in 
said memory unit while said first process is pending in said 
reservation queue. 

11 A multi-processor system as described in claim 11 
wherein said method further comprises the steps of: 
determining an expected waiting time before said first 

process is to obtain said lock; 
accessing a task queue to obtain said first useful task that 

can be completed within said expected waiting time; 

and 

said first process executing said first useful task while said 
first process is pending in said reservation queue for 
said lock. 

13. A multi-processor system as described in claim 11 
wherein said step e) further comprises the step of placing 
said second process to sleep and enabling a processor 
executing said second process to execute a fourth process 
that performs a second useful compute-only task that does 
not require operating system services, such as: formatting 
data packets; calculating checksums; and moving data in 
said memory unit while said second process is pending in 
said reservation queue. 

14. A multi-processor system as described in claim 8 
where said step f) comprises the steps of: 

upon said third process relinquishing said lock, incre- 
menting an in-service counter value to indicate a next 
reservation identifier; 

searching said reservation queue to locate said first pro- 
cess having said next reservation identifier; and 
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generating an asynchronous interrupt to said first process 
causing said operating system to grant said first process 
said lock. 

15. A shared memory unit in a multi-processor system 
having group of processors coupled to a bus and a shared 
resource with an associated lock used for resource access 
synchronization, said shared memory unit containing 
instructions stored therein that, when executed, cause said 
system to implement a method of queuing processes to 
access said lock comprising the steps of: 

a) executing a plurality of processes by said group of 
processors; 

b) intercepting a first process of said plurality of processes 
requesting access to said lock of said shared resource; 

c) issuing to said first process a reservation identifier 
indicating a queue position of said first process with 
respect to processes stored in a reservation queue of 
said memory unit; 

d) storing an identifier of said first process and said 
reservation identifier into a entry of said reservation 
queue; 

e) enabling said processes in said reservation queue, 
including said first process, to perform useful tasks 
while pending in said reservation queue; and 

f) upon a second process relinquishing said lock, using 
said reservation queue to identify a third process in said 
reservation queue and issuing an operating system 
service request to grant said third process access to said 
lock. 

16. A shared memory unit as described in claim 15 
wherein said step b) comprises the steps of: 

monitoring said multi-processor system for $ENQ lock 
calls originating from said plurality of processes; and 

intercepting a $ENQ lock call originating from said first 
process. 

17. A shared memory unit as described in claim 15 
wherein said step c) comprises the steps of: 

upon said first process requesting access to said lock of 
said shared resource, incrementing a latest pending 
service counter value maintained in said memory unit; 



K3,851 

16 

accessing said latest pending service counter value to 

obtain said reservation identifier; and 
issuing said reservation identifier to said first process. 
19. A shared memory unit as described in claim 15 
5 wherein said step e) comprises the step of enabling said first 
process to perform a useful compute-only task that does not 
require operating system services, such as: formatting data 
packets; calculating checksums; and moving data in said 
J0 memory unit, while said first process is pending in said 
reservation queue. 

19. A shared memory unit as described in claim 18 
wherein said method further comprising the steps of: 

determining an expected waiting time before said first 
15 process is to obtain said lock; 

accessing a task queue to obtain a respective useful task 

that can be completed within said expected waiting 

time; and 

said first process executing said respective useful task 
20 while said first process is pending in said reservation 
queue for said lock. 

20. A shared memory unit as described in claim 15 
wherein said step e) comprises the step of placing said first 
process to sleep and enabling a processor executing said first 

25 process to execute a fourth process that performs a useful 
compute-only task that does not require operating system 
services, such as: formatting data packets; calculating 
checksums; and moving data in said memory unit, while said 
first process is pending in said reservation queue. 
30 21. A shared memory unit as described in claim 15 where 
said step f) comprises the steps of: 
upon said second process relinquishing said lock, incre- 
menting an in-service counter to indicate a next reser- 
vation identifier; 

35 

searching said reservation queue to locate said third 

process having said next reservation identifier; and 
generating an asynchronous interrupt to said third process 
causing said operating system to grant said third pro- 
40 cess said lock. 

* ♦ * * » 
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ABSTRACT 



The present invention provides a method and apparatus for 
processing electronic mail in parallel. The present invention 
provides the ability to process mail objects in an electronic 
mail system in parallel. A message can be assigned to a 
queue. One or more processes can manage a plurality of 
messages in the queue. Each process can identify the next 
entry to be processed. Entries previously processed can be 
marked such that subsequent access is locked out. Mail 
objects and process information can be stored in a relational 
database system that provides the ability to perform locking 
at the record level. A process can be configured to perform 
a plurality of activities in a plurality of time periods. A 
guardian process can initiate or terminate other processes 
based on process information. Further, a guardian process 
examines system information periodically and identifies any 
need to initiate, restart, or stop one or more processes. 
Further, the guardian process can pass process information 
to an initiated process. One or more tables can be used to 
retain message information such as a instance table that 
includes a queue column. Further, tables can be used to 
retain process information such as process, process 
parameters, and process time tables. 

20 Claims, 24 Drawing Sheets 
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METHOD AND APPARATUS FOR Instead of assigning a message to a particular process, a 

PROCESSING ELECTRONIC MAIL IN message can be assigned to a queue. Further, multiple 

PARALLEL processes can be assigned to process a queue. Thus, as more 

activity causes the number of entries in a queue to increase, 

This is a continuation of application Ser. No. 08/660,737, 5 additional processes can be assigned to process the queue's 

filed Jun. 6, 1996, now abandoned, which is a continuation entnes - Similarly, as activity decreases and the number of 

of application Ser. No. 08/465,734 filed Jun. 6, 1995, now S^u^bt'SiSf CapablUty 10 

abandoned, which is a continuation of application Ser. No. a t j ueue can . . . « ■, 

08/175,159 filed Feb. 22, 1994 Issued U.S. Pat. No. 5,504, ^ach process can idenufy the next entry to be processed, 

' ' io aD tDen P rocess tne enlrv - Entnes previously processed can 

be marked such that processes that subsequently access the 

BACKGROUND OF THE INVENTION entry are aware that the entry nas been P rocessed - order I f 

for entry selection can be used. For example, queue entries S 

1. Field of the Invention can be placed in the queue in the order in which they are I' 

This invention relates to the field of parallel processing in l5 received. Further, priorities can be assigned to queue entries. 1 

an electronic mail environment. Thus, for example, each process can select queue entries on I 

2 Background Art a ^ ret ^ D ^ rst (FIFO) basis. Further, the FIFO selection I 

can be varied based on the priorities assigned to the queue 1 

Electronic mail messaging provides the ability to com- en tries 1 

municate information throughout an enterprise (e.g., send . , . , , . . . .... 

, . ^ * • \ ™ Any method can be used to identify queue entries previ- 

and receive messages and files between enterprise users). 20 3 . j u f *u 

& . c i -i ously or currently bemg processed by one process. In the 

Electronic mail users can send, for example, mail messages, / , , ' t &r . ' • e 

j- . ■ c jn preferred embodiment, messaging and process information 

scheduling messages, directory information, and files. r t , . . . ' , i ^ . *i_ . j ^ 

° ° J are stored m a relational database system that provides the 

Electronic mail systems provide the ability to perform abmty to perform locking at the record leveL Such a 

mail operations. For example, electronic mail operations relationa i database management system (RDBMS) is pro- 
include the ability to send and receive messages (i.e., maU ^ yided by 0rade Corporation. Messaging and process infor- 
or calendar scheduling messages directory information, and/ mation are stored in relationSj or tables> in the RDBMS. 
or files) Messages received by a user can be for example, A ess caQ bfi ^ {Q rform ^ fc ^ Qr 
read and/or forwarded to another mail user. Further, a user actiyities £ach ^ cafl be ^ d tQ rform Qne Qr 
can send a reply message to the sending user. Other opera- mofe rf ^ activities xs ^ bfi confi d 
tions may be provided to manage messages and files. fo mn dufing a certaifl ^ pefiod ^ fof exampk> 
Messages in a electronic mail systems can be grouped, or multiple processes can be configured to perform garbage 
queued, based on some like characteristic (e.g., the type of collection. A garbage collector process can be further con- 
further processing required). For example, a submission figured to, for example, clean up mail messages or scheduler 
queue can contain messages targeted for a particular loca- ^ messages, or clean up replication or directory registration 
tion. A rerouting queue can be used to store messages that information. Further, a garbage collector can be run at night 
need to be routed to another location. A notification queue t o perform garbage collection on mail messages. Another 
can contain a list of messages that have been placed in a gar bage collector can be run during the daytime to perform 
user's incoming mail box, and for which users are to receive garbage collection tasks. 

notification. A dead message queue can be used to identify The number and lype of processes can be determined or 
messages that are not deliverable or returnable to the sender. altered by a electromc mail system administrator. The 
A garbage collection queue can be used to contain messages m inverition can retain information related to the pre- 
trial can be removed from a system. Remote queues contain cesses Aparent procesSj the procesS) can initiate or 
messages bound for remote locations. Gateway queues con- terminate other processes. A guardian process can access 
tain messages destined for foreign messaging environments. ^ process information l0 determine what number and type of 
As the number of mail users increases, the number of processes to initiate. Further, the guardian process can 
messages to be processed by a mail system typically examine the system information at an interval of time to 
increases. Conversely, as the number of mail users determine what processes are running. Based on the system 
decreases, the number of messages decreases. If, for information and the process information, the guardian can 
example, messaging increases and processing capability to 5Q identify any need to initiate, restart, or stop one or more 
handle messaging remains constant, the number of messages processes. Further, the guardian process can pass process 
in the message queues such as the ones discussed above can identification and other process information to an initiated 
increase. Prior art systems provide the ability to serially process to assist the process in determining how to proceed, 
process messages, or queue entries. However, these systems ij sm g a RDBMS with record locking capability, queue 
do not provide the ability to scale processing (up or down) 55 entries can be stored ^ a database with each queue entry 
to accommodate a change in messaging activity. being a row in a data base relation, or table. As each entry is 

SUMMARY OF THE INVENTION for P rocessin S L th f row in the table that corresponds 

to the queue entry can be locked. Each process can examine 

The present invention provides the ability to scale an a snapshot of the queue and attempt to access the next queue 

electronic mail system. The present invention provides the 60 entry. If the entry is not locked, the entry can be selected for 

ability to process mail entries in parallel to accommodate processing. If the entry is locked, the entry cannot be 

increased messaging activity. Further, the present invention selected by a subsequent process. 

provides the ability to down scale processing capability to One or more tables can be used to retain message infor- 

accommodate decreases in messaging activity. mation. For example, an instance table can contain an entry 

The present invention provides the ability to scale a queue 65 for each instance of a message and retain queue information, 

such that a queue can be generic and have one or more This table can be examined by the processes to identify the 

processes manage a portion of messages in the queue. next message to be processed. 
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Additional tables can be used to retain process informa- The amount of traffic in a mail system can vary. As the 

tion. For example, a process table can contain a class mail activity varies, the number of items stored in a system 

designation, instance identifier, flags, timestamps (e.g., last queue can vary. For example, when mail activity increases 

wake time and last sleep time), and a process state (e.g., run while the ability to process the increased mail items remains 

or not run). Another table can be used to define general 5 stable, the number of mail items waiting to be processed can 

information for each class of processes. For example, fields increase. The present invention provides the ability to extend 

in the table can be used to assign names to the executables the processing capability of an electronic mail system to 

in each class. handle such increases in activity. That is, the present inven- 

A process parameters table contains parameter informa- tion provides the ability to process mail entries in parallel to 
tion for a process instance or for a class of processes, A io accommodate increased messaging activity, 

process can be configured for periods of dormancy between Conversely, when a decrease in mail activity occurs and 

work cycles (i.e., performing configured tasks). A process processing capability remains stable, some processing capa- 

time table is used to determine the periods in which a bility can become idle. The present invention provides the 

process is to remain dormant. For example, the table can ability to scale back processing capability to accommodate 

contain information regarding the time of day that a process 15 the reduced mail activity. 

is to run. The present invention provides the ability to scale a queue 

BRIEF DESCRIPTION OF THE DRAWINGS SUch ttat 3 queUe Can rt be g T,f ^ ■"i" ^ 

processes process a portion of the messages in the queue. 

FIGS. 1A-1E illustrate mail system queues and processes. Instead of assigning a message to a particular process, a 

FIG. 2 illustrates a process flow for a guardian process. 20 message can be assigned to a queue. Further, multiple 

FIG. 3 illustrates a guardianlnit process flow. processes can be assigned to process a queue. FIG. 1A 

rirp A .« , . . . n a illustrates a mail system queue 102 that contains mail entries 

FIG. 4 illustrates a terminate Process process now. w •« . . , + M ^ 

, .„ „ n 112A Mail entries 112A are assigned to queue 102. Server 

MO. 5 illustrates a startberver process How. A 1M and Server fi 1Q6 haye beeQ configured 

to process 

FIG. 6 illustrates a findChild process flow. 25 mail enlries in queue 102 . 

FIG. 7 illustrates a findProcess process flow. Server A 104 and Server B 106 one or more of 

FIG. 8 illustrates a spawnProcess process flow. entries 112A in queue 102 to process. In the present 

FIGS. 9-A-9D illustrate a postman process flow. invention, any selection technique can be used to select the 

FIG. 10 provides an example of a checkState process flow. 30 next entry or entries to be processed by a process. For 

FIG. 11 illustrates a performServerAction process flow example, queue entries can be selected from the queue in the 

for a Postman process or( * er * n wmcn tnev are received into the queue using a First 

™„ ii mi . ♦ \ i j i- a I Q First Out (FIFO) method. Further, priorities can be 

FIG. 12 illustrates a local message delivery process flow , v ' . , , . , 

° J r assigned to queue entries. Thus, the selection can be made 

f. . ,„ , , based on priorities assigned to queue entries. 

FIG. 13A illustrates a process table. 35 , . , 

™^ ,r, , , Thus, usmg a selection technique, each process process- 

13B P rovldes an exam P le of a class table stmcture * ing queue entries can identify the next entry or entries to be 

FIG. 13C provides an example of a process parameters processed. Once an entry has been processed, it can be 

ta bl e - marked to prevent another process from processing the 

FIG. 14A provides an example of a process tokens table. 4Q entry. Any method can be used to identify queue entries 

FIG. 14B provides an illustration of a process time table. previously or currently being processed by one process 

FIG. 15 provides an example of an instance queue table. without departing from the scope of the present invention. 

In the preferred embodiment, messaging and process 

DETAILED DESCRIPTION OF THE information are stored in a relational database system that 

INVENTION 45 p r0 vides the ability to perform locking at the record level. 

A method and apparatus for processing electronic mail in Such a relational database management system (RDBMS) is 

parallel is described. In the following description, numerous provided by Oracle Corporation. Using an RDBMS, mes- 

specific details (e.g., specific table entries) are set forth in saging (e.g., queue entries) can be stored in relations, or 

order to provide a more thorough description of the present tables, in the RDBMS. When a process selects a mail item 

invention. It will be apparent, however, to one skilled in the 50 (i* e > queue entry) for processing, the record that represents 

art, that the present invention may be practiced without these the item is locked. If another process attempts to select the 

specific details. In other instances, well-known features have same mail item from the queue, a locking exception is 

not been described in detail so as not to obscure the generated. Thus, subsequent processes can identify the 

invention. queue entries handled by another process. Other methods for 

Electronic mail systems store mail items while they wait 55 identifying items previously processed can be used without 

to be processed by the system. In the present invention, departing from the scope of the present invention, 

queues can be used to store mail items awaiting processing As mail system activity increases, there is an increase in 

by the mail system. For example, a mail message sent by one the number of mail items that must be processed by the mail 

mail user to another may be stored in multiple queues on its system. If processing capability remains stable, the number 

journey from the sender to the recipient. The message is 60 of mail entries in a queue can increase such as is illustrated 

maintained in a queue awaiting whatever processing is in FIG. IB. Queue 102 now contains entries 112B for 

needed. For example, a message being sent across a gateway processing by Servers A and B. 

to a user on another mail system may be stored in a remote To handle the increase in queue entries, additional pro- 
queue to await forwarding to the other system. Upon its cesses can be added as illustrated in FIG. 1C. In addition to 
arrival at the remote node, it can be placed in a rerouting 65 servers A and B, servers C and D have been configured to 
queue awaiting transmittal to the appropriate queue on the process entries 112B. Assuming a stable level of system 
remote node. activity, the additional processing capability can result in a 
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reduction in queue entries as illustrated in FIG. ID. A system Flags field 1308 can contain any number of flags to further 

administrator, upon viewing the situation illustrated in FIG. define a process instance. For example, flags 1308 can be 

ID, can determine that some of the processing capability used to particularize the tasks to be performed by a process 

assigned to queue 102 is not needed and can be removed. instance. Thus, multiple instances of a process class can 

FIG. IE illustrates queue 102 and a reduction in the pro- 5 handle some subset of the total tasks defined for the class. 

cessing capability illustrated in FIG. ID, That is, Servers The flags field for an instance of the postman process, for 

B-D in FIG. ID have been eliminated and one server (i.e., example, can be used to indicate that the postman instance 

server A) remains to handle the mail entries 112C. perform local delivery, remote delivery, gateway processing, 

n or notification. To illustrate further, the flags field for an 
Processes 

io instance of a garbage collector process can be used to 

Different types of processes can be used in the present indicate that the process cleanup registration records, or 

invention to perform mail system tasks. The following are perform scheduler, directory, or mail garbage collection, 

examples of processes and some of the tasks that can be A process instance can become dormant during execution, 

performed in a mail system. Additional process types and For examp i e> during its active state, a process can perform 

tasks can be used with the present invention without depart- is its tasks performing its task, the process can lay 

ing from its scope. Examples of types of processes include: dorma nt, or passive, for a period of time before becoming 

postman, scheduler, replicator, monitor, statistics, garbage active again and performing its defined tasks. LastWake- 

collector, and guardian. A process can be used to perform Time 1310 is t0 identify the time at which a process 

multiple, or different tasks or activities. Further, processes awoke from a dormant period LastSleepTime 1312 is used 

can be configured to run during a certain time period. The 20 t0 identify the time at which the process last entered into a 

number and characteristics of processes can be determined dormant period. ProcessState 1314 indicates the state of a 

or altered by a electronic mail system administrator based on process ( e g ? whet her or not the process should be run), 

such factors as system activity levels. Qass informatioo is stored in the dass table ^ ^ v6hn 

Apostman process, for example, delivers local mail items p roce ss can, for example, use the information contained in 

(e.g., scheduling and mail), remote mail items, handles 25 this table to determine names for an executable in each class, 

triggered mail items (e.g., return receipts and auto -forward), FIG. 13B provides an example of a class table structure, 

and send notification of new messages locally. A scheduler ciassld 1324 has the same meaning as in the process table, 

process can be used to handle scheduling requests. A repli- LoginName 1326 and password 1332 are used to authenti- 

cator process can be used to synchronize directory informa- cate lhe login t0> for example, the RDBMS. 

tion. A monitor process can be used to check message flow, 30 ^ ^ mg {& ^ ^ ^ & ^ fof 

database space usage, and process status. A garbage collec- {n a confi tion or managemem panel or report . 

tor process can remove unneeded mail items (e.g., unowned DomainId im c ^ be ^ fof s ( . a ^ 

messages) and reclaim the space used for these items. A betweeQ g ^ different ^ for ^ 

process, a guardian process, can act as the parent process for defined a licatioflS as defined b Ap p lication i d fie l d 1334. 

the other processes The parent process can start and then 35 ExecName 1336 ideQtifies ^ name Qf aQ executable mod . 

stop the processes. It can verify that the proper number of ^ fl mo(Jule ^ rf executioQ m ^ system) 

each process type is running. Subsystem field 1338 can be used to group together a variety 

Multiple processes, for example, can be configured to of individual processes into a single module (i.e., mad or 

perform garbage collection. A garbage collector process can schedular) 

be further configured to, for example, clean up mail or 40 A guardian process is responsible for invoking a process 

scheduler messages, or dean up replication or directory and { tQ ^ s {{s s dass and 

registration information. One of the garbage collector pro- instanceId value> A guardian process can access process 

cesses can run at night to perform garbage collection on mad mformation t0 determine what number and type of processes 

messages. Another garbage collector can be run during the tQ 

daytime to perform garbage collection tasks. 45 r ' .« 

J F & & Further, the guardian process can examine the system 

Data Tables information at an interval of time to determine what pro- 

System information can be stored such that it can be cesses are mnning. Based on the system information and the 

referenced intermittently during processing, and at system P rocess ^™tion, the guardian can identify any need to 

startup. In the preferred embodiment, this information is 50 vitiate, restart, or stop one or more processes. Further, the 

stored in a relational database system such as the relational S uardian P rocess ca ° P* 55 P rocess » dentlfica ^n and other 

database management system (RDBMS) provided by Oracle P rocess "Ration (i.e parameters) to an initiated process 

Corporation. Information stored in RDBMS tables includes to ^ the P rocess m determining how to proceed, 

messaging and processing information. Specific details used Parameters specific to each individual server can be 

to describe the type of information associated with mail and 55 defined in a process paramaters table. Further, generic 

processes is only for the sake of illustration. Additional or P rocess Parameters can be stored in the parameters table, 

different information can be used without departing from the 0nce a process is initiated, it is responsible for fetching any 

scope of the invention parameters in the parameters table. Further, each process can 

determine the frequency at which to refresh the values for its 

—Process Information— 60 parameters. FIG. 13C provides an example of a process 

A process is assigned a record in a process table. This parameters table, 

record is used by the guardian process as a request for Ciassld 1354 has the same meaning as previously 

invocation. FIG. 13A illustrates a process table. Classid described. InstanceId 1356 identifies a process instance as 

1304 contains an identification of a process class (e.g., previously described, or identifies that the record contains 

postman). InstanceId 1306 contains a unique value within a 65 generic, class parameters. That is, a null or zero value for 

particular process class. It differentiates among different instanceId 1356 indicates that the corresponding record 

instances of a particular class of process. contains class level parameters. These generic parameters 
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can be overridden by specific parameters (i.e., parameters 
specific to a process instance). Parameter 1358 identifies a 
particular parameter. The valueNum 1360, valueChar 1362, 
and valueDate 1364 fields contain the actual parameter 
values (i.e., of type number, character, and date, 5 
respectively). 

Each parameter for a process is paired with an identifying 
token. Tokens are stored in the process tokens table. Tokens 
can be described, for example, by the mail administrator. 
FIG. 14A provides an example of a process tokens table. 10 
Classld 1404 has the same description as previously 
described. Parameterld 1406 identifies a particular param- 
eter. ParameterType 1408 identifies parameter types (e.g., 
number, character and date). Name 1410 can be used to 
identify the token in a display. The description field 1412 can l5 
be used to provide a description or commentary for a token. 

Each process has associated record(s) in the process time 
table. The process time table is used to manage the wake and 
sleep times for a process instance. Process time table records 
can be used by an instantiated process to determine its actual 20 
requested Active and Passive (i.e., sleeping) times. FIG. 14B 
provides an illustration of a process time table. 

The classld field 1434 and instanceld field 1436 are the 
same as the similarly-named fields in the previously 
described tables. StartTime 1438 contains the value that 25 
identifies when a process begins its current state. The 
duration field 1440 indicates the length of time that a process 
is to remain in a state (e.g., active or dormant). The process 
will compare the startTime and duration values and the 
current time to determine whether or not it is to change 30 
states. 

The flags field 1442 is used to specify the desired state 
during this designated time. For example, a flags value may 
indicate active to specify that the associated process is to be 
active at this time, or it may be used to indicate passive to 35 
specify that the process is meant to be dormant during this 
time. Processes can be tuned with this parameter having a 
different value during different times of the day. The sleep- 
Time field 1444 indicates the delay (e.g., in minutes) 
between cycles. The state field 1446 indicates the state of the 40 
process (e.g., active or passive). The runlndex 1448 indi- 
cates a run state that is examined for changes. 

— Mail Objects — 

As previously indicated, mail objects (e.g., messages) can 45 
be retained in tables in an RDBMS. Tables can indicate one 
or more queues to which a mail object belongs. A message 
can be contained in more than one queue. For example, a 
message sent to both local and remote users can be contained 
in multiple queues (e.g., a local delivery queue and remote 50 
delivery queue). Further, information associated with mail 
objects can be stored in tables such as an instance table. An 
instance table entry contains information associated with a 
message instance. FIG. 15 provides an example of an 
instance table. 55 

Each object is identified by an identifier that is unique at 
each node. The msgld 1504 provides this unique identifica- 
tion. Using a unique message identifier, for example, pro- 
vides the ability to relate additional mail object information 
in other tables with a given mail object. Folderld 1506 60 
provides ownership and location information. For example, 
a user's inbox value is stored in the folderld field value for 
new, unread or read messages. Or a gateway outbox value is 
stored in the folderld field for a message awaiting submis- 
sion to a gateway. 55 

Apriority field 1508 identifies a mail object's priority. As 
previously indicated, the priority can be used to determine 
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the order in which mail objects are processed. The flags field 
1510 provides additional information associated with a mail 
object. For example, whether or not the owner of a message 
is a blind carbon copy recipient. The retentionDate and 
receivedDate fields (i.e., 1512 and 1514, respectively) pro- 
vide time stamp information that can be used, for example, 
in garbage collection or as the entry time of a message in a 
queue. Status 1516 indicates the state of a mail object (e.g., 
new or unread). 

The queue field 1518 defines the queue in which the 
associated mail object instance resides. This field can be 
examined by a process to determine the mail objects to be 
processed in a particular queue. For example, a postman 
process that is configured to perform a notification task may 
examine the instance table to identify objects in the notifi- 
cation queue that are to be processed. 

Guardian 

A guardian process determines the number and type of 
processes to initiate based on configuration information 
supplied by the mail system administrator. In the preferred 
embodiment, this information is stored in relations in an 
RDBMS as previously described. However, any method of 
retaining configuration information can be used with the 
present invention. 

Further, the guardian process retains a snapshot of current 
processes, and can obtain a new snapshot. Based on a 
comparison of the two snapshots and the configuration 
information, the guardian can determine whether or not to 
initiate, restart, or stop one or more processes. Further, the 
guardian process can pass process identification and other 
process information to a process. A guardian process can act 
as the parent process for other processes. It spawns or 
terminates a process after it verifies the proper number of 
each process type. 

FIG. 2 illustrates a process flow for a guardian process. At 
decision block 202 (i.e., "any signal from a child process?"), 
if there is no signal from a child process, processing con- 
tinues at block 204 to block any restart signals and to get any 
previously generated restart or terminate signals that have 
not been processed. Processing continues at decision block 
206. If a signal is received from a child process, processing 
continues at decision block 206. 

At decision block 206 (i.e., "terminate signal?"), if the 
signal is a terminate signal, processing continues at block 
208 to mark all processes spawned by the guardian as 
obsolete (i.e., terminable). Processing continues at process- 
ing block 210. If, at decision block 206, the signal is not a 
terminate signal, processing continues at block 210 to 
invoke Guardianlnit to, for example, generate a new process 
snapshot. At block 212, terminateProcess is invoked to kill 
the appropriate processes. At block 214, startProcess is 
invoked to start the appropriate processes. 

At decision block 216 (i.e., "any child processes still 
running?"), if there are no spawned processes running, 
processing ends at block 218. If spawned processes are 
running, processing continues at block 220 to unblock the 
restart signal. At block 222, guardian waits for a signal. 
Signals can be generated by a child or as a result of system 
administrator input. When guardian receives a signal, pro- 
cessing continues at block 224. At block 224, findChild is 
invoked to identify the processed associated with the signal 
generator. At decision block 226 (i.e., "child found?"), if the 
signal generator is unknown, processing continues at block 
216. 

If, at decision block 226, the signal generator is identified, 
processing continues at block 228. At block 228, the 
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respawo variable is set to include the run and restart alter- — StartServer — 

natives. At decision block 230 (i.e., "child terminated and n . . ... t , , ~ r 

a . . . t . . , v „„v , . . , Processes can be initiated by a guardian process, for 

configured to run if it terminates? ), if a terminated process . t * i_ ji-** i 

& c , . . ... * • *• ■ example, upon a system startup, or when additional pro- 
is configured to be restarted upon termination, processing . ,. r , . , . t v ■ • l, -i 
& . . i i -lai * « ' • • • , cessing capability is needed to handle an increase in mail 
continues at block 232 to reset respawn to indicate 5 . & _ .„ . . t i0 a A . 
« . t „ j • * i • • ui i 11^ activity. rlG. 5 illustrates a startServer process now. At 
stateRun and processing continues at decision block 234. , . . . , , r (( ; T ( , . 
Tf . . F * j ■ ■ t_i i 1*3/4 decision block 522 (i.e., "all processes in table memory 
If not, processing continues at decision block 234. , nm r n , , -, 

r & processed? ), if all processes have been processed, process- 
At decision block 234 (i.e., "process state for child i ng returns at block 524. If not, processing continues at block 
process & respawn=respawn"), if the state of a child process 526 t0 get the next process in the process snapshot, 
(i.e., signal generator) is to be respawned based on the value 10 At . . . , MO r u , t . . ft . 
v c ' & & . ' f . At decision block 528 (i.e., "process state= run ?), if the 
of the respawn variable, processing continues at block 236 . t e ., , /. • / 
j . , K r ,. u „ - * j ■■ state of the process is not set to run, processing continues at 
to invoke spawnChild. Processing conUnues at decision , . . - j r .t. 
ui i if ■< ■ j , j 4 i , «. , . decision block 522 to process any remainder of the pro- 
block 216. If it is determined that the child process is not TC • 
. 4jJi , . . *• * j • • cesses. If the process state is equal to run, processing 
intended to be respawned, processing continues at decision . t , . . U1 , e ~ A . . , . . . , ; _. A r 

v & 1( - continues at decision block 530. At decision block 530 (i.e., 

block 216. 15 « .j . 0 „, . c | * 

process already running? ), if the process is already 

Guardianlnit running, processing continues at decision block 522 to 

process the remaining processes. If not, processing contin- 

Guardianlnit is invoked in the guardian process flow to, ues at decision block 532 . 

for example, fetch a new process snapshot from the „ 4 . . c , . . . . . 

RDBMS. FIG. 3 illustrates a guardianlnit process flow. At 20 , 0n f techm 3 ue for d f erm,mn e whether ° r not ? P rocess 15 

block 302, an RDBMS connection is established. At block already "I' mvolves "■'""8 a ° f ™' n S 



304, the number of processes in the process table is deter- 
mined. This count can be used, for example, for memory 



processes and their associated class and instance identifica- 
tions. Thus, the list of executing processes can be examined 



UI1UK/U. 1 lllO VUUUt wail V\s UO^U, lV^l VAOUJUIV, 1U1 LU^UIUl * , . . 1.1 „ i , 

„ 4 . a n * * j • ui 1 iw *u * * to determine whether or not a process is already running, 

allocation purposes. As illustrated in block 306, the count is A , . . 4 , t - e .if 

, . ii * j,v i c ,l 25 Any other method can be used without departing from the 

used to allocate any additional memory tor the process r *u • 

•r j * . * »j- j j scope of the invention, 

information data structures stored in memory and accessed r 

by the guardian process. processing block 532, spawnProcess is invoked to 

a II , » AO . . . f t . , j. initiate the process. Processing continues at decision block 

At block 308, a new process snapshot is fetched from the r . . , 

process table. At block 310, the restart bit in the processState 3Q 522 t0 P rocess remamln 8 sna P shot en,nes - 

field of the process table is turned off. At block 312, a node findChild 

state variable is set to "shut down." At decision block 314 

(i.e., "at least one process in table with state= < run , 7"), if the ^ findChild process can associate an executing process 

snapshot contains at least one process that is to be run, a process snapshot entry based on like process infor- 

processing continues at block 316 to set the node state 35 mation ( e -g-> classId and instanceld). It can be invoked, for 

variable to "operational," and processing continues at block example, to determine which child process terminated. FIG. 

318. If not, processing continues at block 318. At block 318, 6 illustrates a findChild process flow. At decision block 602 

the state of the node is set to the node state variable. (i- e > " an entries in spawned process list checked?"), if all 

Processing returns at block 320. entries in a list of executing processes has been processed, 

processing returns at 604 with a return code to indicate that 

— TerminateProcess — 40 no child process was found (e.g., a negative one). 

Process termination can, for example, occur when it is If a ^ entries have not been processed, processing contin- 

determined that a surplus of processing capability exists for ues at block 606 to obtain the next entry in the list. At 

a given queue. For example, a mail system administrator decision block 608 (i.e., "child's pid-desired pid?"), if the 

monitoring system activity may determine that a queue that 45 entry's process identification (e.g., process identification 

is being managed, or handled, by two Postman processes, generated by the operating system when the process was 

can be managed by one Postman process. The system initiated) is not the same as the desired pid (e.g., the pid 

administrator can generate a signal for the guardian to accompanying the process' termination signal), processing 

terminate one of the Postmen. Further, processes may be continues at decision block 602 to examine the remaining 

terminated when shutting down a mail system. 50 entries in the spawned process list. 

FIG. 4 illustrates a terminateProcess process flow. At If it is the same, processing continues at block 610 to 

decision block 402 (i.e., "all processes in process data delete the entry from the spawned process list. At block 612, 

structure processed?"), if all of the processes in the guard- findProcess is invoked to identify the terminated process' 

ian's process snapshot have been processed, processing entry in the process snapshot. The location of the process in 

returns at 404. If not, processing continues at processing 55 the process snapshot is returned at block 614. 
block 406. At block 406, the next spawned process' classId 

and instanceld are identified. At block 408, findProcess is —findProcess— 

invoked to locate a snapshot entry corresponding to the An entry in the process snapshot can be obtained using the 

spawned process' classId and instanceld. findProcess flow illustrated in FIG. 7. At decision block 702 

At decision block 410 (i.e., "process not found or process 60 (i.e., "all snapshot entries checked?"), if all snapshot entries 

state>=restart or process stateorun?"), if the entry in the have been checked, processing returns at 704 with a return 

process snapshot is to run or be restarted upon termination, code to indicate that no snapshot entry was found (e.g., a 

or was not found, processing continues at decision block 402 negative one). If not, processing continues at block 706 to 

to process any remaining children. If not, processing con- get the next snapshot entry. 

tinues at block 412 to terminate the execution of the process. 65 At decision block 708 (i.e., "current snapshot entry's 

Processing continues at decision block 402 to process any classId and instanceld=desired class and instance?"), if the 

remaining processes. current entry has the same class and instance identification 
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as the desired class and instance information, processing 
returns at block 710 with a snapshot entry identification. If 
not, processing continues at decision block 702 to process 
any remaining snapshot entries. 

5 

— spawnProcess — 

FIG. 8 illustrates a process flow, spawnProcess, for 
spawning a child process. At block 802, the process state is 
set to "restart." At block 804, a parameter list (e.g., classld 
and instanceld) is generated to pass to the spawned process. 10 
The process is spawned (e.g., using a fork operation in Unix) 
and the parameters are passed to the new process at block 
806. At block 808, information associated with the spawned 
process is stored in the spawned process list (e.g., class, 
instance, spawned process' system identification). At block 15 
810, the spawnedProcess counter is incremented by one. 
Processing returns at block 812. 

Spawned Process 

Different types of processes can be used in the present 20 
invention to perform mail system tasks. A process' configu- 
ration is determined from the information in the process 
tables (e.g., process, class, process parameters, process time, 
and process tokens tables). A configuration can, for example 
define the tasks to be performed by a spawned process. 25 
Examples of types of process' and associated tasks have 
been described previously. Additional or different process 
types and tasks can be used without departing from the scope 
of this invention. 

As previously described, a postman process can deliver 
local mail objects, remote mail objects, handle triggered 
mail objects (e.g., return receipts and auto-forward), and 
send local users notification of new messages. FIG. 9A 
illustrates a postman process flow. At block 902, passCount 35 
is set to zero and exitCode is initialized to "Restart." At 
block 906, any parameters sent by the invoking process are 
obtained. At block 910, checkState is invoked to determine 
the state of a process and system. 

At decision block 912 (i.e., "system shutdown or 40 
restarted?"), if the system was shutdown or restarted, pro- 
cessing continues at block 914 to set the exitCode to 
"noRestart," and processing continues at block 926. If not, 
processing continues at decision block 916. At decision 
block 916 (i.e., "runFlag=' Active'?"), if the flag indicates 45 
that the process is not meant to be active during this period, 
processing continues at decision block 940. 

If the process is meant to be active, processing continues 
at decision block 918. At decision block 918 (i.e., "server's 
actions contained in command-line argument?"), if configu- 50 
ration information was passed to the process, processing 
continues at block 922. If not, processing continues at block 
920 to get the postman flags for this process from the process 
parameters table. At block 922, the process table's wake- 
Time timestamp associated with this process is updated. At 55 
block 924 performServerAction is invoked to perform the 
tasks configured for this process. Processing continues at 
decision block 940. 

— Dormancy — 6Q 

A process can be configured to sleep after completing a 
processing pass or cycle wherein the process attempts to 
perform tasks for which it is configured to perform. As 
illustrated in the Postman process flow, the dormancy stage 
of processing can be initiated when the postman completes 65 
one cycle of processing, or when the process* flags indicates 
that the process is meant to be dormant. 
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FIG. 9C illustrates the dormancy preparation for the 
postman process. At decision block 940 (i.e., "run multiple 
passes?"), if a process is not meant to run more than one 
pass, or cycle, processing continues at block 942 to set 
exitCode to "noRestart" and processing continues at block 
926. If the process is configured to run multiple passes, 
processing continues at block 943. At block 943, a pass- 
Count is incremented by one. At decision block 944 (i.e., 
"passCount=number of passes desired?"), if the configured 
number of passes have been performed, processing contin- 
ues at block 946 to set exitCode to "noRestart" and pro- 
cessing continues at block 926. 

If the number of multiple cycles has not been achieved, 
processing continues at block 948 to invoke checkState. At 
decision block 950 (i.e., "system shutdown or restarted?"), 
if the system was shutdown or restarted, processing contin- 
ues at block 952 to set the exitCode to "noRestart," and 
processing continues at block 926. If not, processing con- 
tinues at decision block 954. At decision block 954 (i.e., 
"delay time contained in command-line arguments?"), if the 
delay time was passed to the process, processing continues 
at block 956 to assign the passed value to the delayTime 
variable, and processing continues at decision block 968. 

If not, processing continues at decision block 958. At 
decision block 958 (i.e., "runFlag=' Active*?"), if the flag 
indicates that the process is not meant to be active during 
this period, delayTime is set to the time remaining until it is 
to be active, and processing continues at decision block 968. 
If the flag indicates that the process is meant to be active, 
processing continues at decision block 962. 

At decision block 962 (i.e., "specific delay time set for 
this server?"), if there is a delayTime value for this process, 
processing continues at block 966 to set the delayTime 
variable to this time. Processing continues at decision block 
968. If not, a default delayTime is used at block 964, and 
processing continues at decision block 968. 

At decision block 968 (i.e., "reLogin indicator contained 
in command-line arguments?"), if the indicator for logging 
back into the RDBMS was passed to the process, processing 
continues at block 970 to use the passed value to set the 
reLogin variable. Processing continues at block 978. If not, 
processing continues at decision block 972. 

At decision block 972 (i.e., "specific reLogin indicator set 
for this type of server?"), if a specific reLogin value is set, 
this value is used to set the reLogin variable at block 976, 
and processing continues at block 978. If not, processing 
continues at block 974 to use a default for this class of 
process, and processing continues at block 978. 

At block 978, the time at which the process becomes 
dormant (i.e., the sleepTime timestamp) is stored in the 
associated entry in the process table. At decision block 980 
(i.e., "reLogin="Yes"?"), if reLogin is positive, registration 
of the process is deleted, and processing continues at block 
984. If it is negative, processing continues at block 984, At 
block 984, the process becomes dormant for the number of 
minutes determined by the value of delayTime. After 
awaking, the process continues at block 910 to determine the 
state of the system. 

— Process Completion — 

As previously indicated, a process can complete after one 
or more cycles, or upon some other indication. FIG. 9B 
illustrates a Postman process flow anticipating process 
completion. At block 926, work areas are freed. At decision 
block 928 (i.e., "connected to database?"), if the process is 
not connected to the database, processing continues at block 
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934, If it is, processing continues at block 930 to rollback queue entries can be stored in a database with each queue 

any uncommitted transactions. Processing continues at entry being a row in a database relation, or table (e.g., the 

block 934. instance table). As each entry is selected for processing, the 

At block 934, any space acquired during processing (e.g., row in the table that corresponds to the queue entry can be 

contexts) is freed. Processing returns at block 936 to the 5 locked. Each process can examine a snapshot of the queue 

invoking process with the value of exitCode. The exitCode and attempt to access the next queue entry. If the entry is not 

value can be used by the invoking process (e.g., guardian) to locked, the entry can be selected for processing. If the entry 

determine whether or not this process is to be restarted. is locked, the entry cannot be selected by a subsequent, 

inquiring process. 

checkState 10 pjQ \± includes an example of a postman's local message 

The checkState process checks the state of the system and delivery configurable task. FIG. 12 illustrates a local mes- 

a particular process. FIG. 10 provides an example of a sage delivery process flow including locking. At block 1202, 

checkState process flow. At block 1002, either specific or message and folder identification is obtained from the 

generic (where there are no specific values) startTime, instance table where the queue is a specified queue value and 

duration, flags, sleepTime, processState, and mnlndex val- 15 the retentionDate is less than or equal to the system date. The 

ues are obtained from the proclime table. At block 1004, selected messages are ordered by priority and retentionDate. 

runFlag variable is updated from the flags value. The The number of messages obtained is set in msgCount at 

remainingTime is determined from the startTime and dura- processing block i 20 4. At decision block 1206 (i.e., "all 

tion values at block 1006. At decision block 1012 (i.e., messages proce ssed?"), if all of the selected mes- 

"system shutdown or restarted, or change in runlndex?"), if 20 sages are processed> pr0C essing returns the number of suc- 

the system has shutdown or been restarted or there has been cessM deUveries at block 120 8. If not, processing continues 

a change in runlndex, processing returns a warning at block at Wock m2 tQ ^ the next mail object from those 

1014. If not, processing returns a successful value at block A for update causes the message tQ be 

locked from other access. Further, such a selection locks out 



1016. 

— performServerAction — ^ subsequent attempts to access the record. Thus, at decision 

™„ ,i -n * . c o . , • „ block 1214 (i.e., "locking error?"), if a locking error occurs, 

FIG. 11 illustrates a performServerAction process now . v , ' , ° / . & 

- ~ 4 j • ■ i_i i ihm /• « * the process logs the error and processing continues at 

for a Postman process. At decision block 1102 (i.e., "remote , , t fl A , , , , £ 

a in m -c fu * a - * iL n * decision block 1206 to select another from any remaining 

flag set.' ), it the remote nag is set, the Postman performs ' ° 

remote instance deletion at processing block 1104 and 30 messa & es ' 

processing continues at decision block 1106. If not, process- If a locking error does not occur, processing continues at 

ing continues at decision block 1106. At decision block 1106 decision block 1218 to handle other exceptions. If it is 

(i.e., "gateway flag set?"), if the gateway flag is set, the determined that the message has already been delivered, 

Postman performs gateway processing at processing block processing continues at block 1220 to generate a log entry. 

1108 and processing continues at decision block 1110. If not, 35 Processing continues at decision block 1206 to process any 

processing continues at decision block 1110. remaining messages. If not, processing continues at decision 

At decision block 1110 (i.e., "local flag set?"), if the local block 1222 " other errorr >' ^T^™"' 

flag is set, the Postman performs message forwarding to error occurs, processing returns at block 1238. 

remote sites at processing block 1112 and processing con- If D0 locking or other error occurs, processing continues 

tinues at decision block 1114. If not, processing continues at 40 at block 1226, to deliver the message. At block 1228, a log 

decision block 1114. At decision block 1114 (i.e., "local flag entrv » generated to log the status of the delivery. At 

set?"), if the local flag is set, the Postman performs local decision block 1230 (i.e, "delivery error?"), if a delivery 

message delivery at processing block 1116 and processing error did not processing continues at block 1234 to 

continues at decision block 1118. If not, processing contin- increment the successful delivery counter, and processing 

ues at decision block 1118. 45 continues at decision block 1206. If a delivery error 

At decision block 1118 (i.e., "notification flag set?"), if the occurred, processing continues at block 1232 to unblock the 

notification flag is set, the Postman performs notification at message, and processing continues at decision block 1236. 

processing block 1120 and processing continues at decision At decision block 1236 (i.e., "fatal?"), if the error is a fatal 

block 1122. If not, processing continues at decision block error, processing returns at block 1238. If not, processing 

1122. At decision block 1122 (i.e., "local flag set?"), if the 50 continues at decision block 1206 to process any remaining 

local flag is set, the Postman performs local message for- messages. 

warding at processing block 1124 and processing continues Thus, a method and apparatus for processing electronic 

at decision block 1124. If not, processing continues at mail in parallel has been provided. 

decision block 1126. What is claimed is: 

At decision block 1126 (i.e., "remote flag set?"), if the 55 1. A method for processing electronic mail objects in 

remote flag is set, the Postman performs remote message parallel in a computer system, the method comprising the 

delivery at processing block 1128 and processing continues computer-implemented steps of: 

at decision block 1130. If not, processing continues at receiving electronic mail objects in a data structure; 

decision block 1130. At decision block 1130 (i.e., "gateway a first rocess spawning chiId pr0C esses; 

-flag set?"), if the gateway flag is set, the Postman performs 60 , t 

, /' c 4 ' x r , t said child processes processing said electronic mail 

non-local delivery 01 remote messages to gateway nodes at h i 11 l* 

processing block 1132 and processing returns at decision " ' 

block 1134. If not, processing returns at decision block 1134. a P rocess deteclul g a chan g e m the number of said 

electronic mail objects in said data structure; 

Mail Object Locking 65 a thifd process spawn i ng at i east one additional child 

To perform configured tasks, one or more processes must process in response to an increase in the number of said 

access records in a queue. In the preferred embodiment, electronic mail objects in said data structure; 
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a fourth process terminating at least one of said child 
processes in response to a decrease in the number of 
said electronic mail objects in said data structure; and 

at least one of said child processes processing additional 
electronic mail objects until terminated by said fourth 
process. 

2. The method of claim 1 wherein said step of spawning 
child processes, said step of detecting a change in the 
number of said electronic mail objects, said step of spawning 
at least one additional child process and said step of termi- 
nating at least one of said child processes are performed by 
a single process. 

3. The method of claim 1 wherein said step of receiving 
electronic mail objects in a data structure comprises the step 
of queuing said electronic mail objects in a queue. 

4. The method of claim 1 wherein said step of detecting 
a change in the number of said electronic mail objects 
comprises the step of detecting a change in the size of said 
data structure. 

5. The method of claim 1 wherein said step of processing 
comprises the steps of one of said child processes: 

selecting one of said electronic mail objects from said 

data structure; 
locking said electronic mail object to prevent selection 

thereof by another of said child processes; 
processing said electronic mail object; and 
removing said electronic mail object from said data 

structure. 

6. The method of claim 5 wherein said step of selecting 
comprises the step of selecting an electronic mail object 
having a higher priority than others of said electronic mail 
objects. 

7. The method of claim 5 wherein said step of selecting 
comprises the step of selecting an electronic mail object 
placed in said data structure before others of said electronic 
mail objects. 

8. The method of claim 5 wherein said step of placing 
electronic mail objects in a data structure comprises the step 
of storing each of said electronic mail objects in a record of 
a data base and said step of locking said electronic mail 
object comprises the step of locking the record in which said 
electronic mail object is stored. 

9. In an electronic mail system, a method of dynamically 
scaling processing capability comprising the steps of: 

said electronic mail system receiving electronic mail 
objects; 

a first process spawning child processes; 

said child processes processing said electronic mail 
objects in parallel; 

a second process configured to detect an increase in 
processing activity of said child processes and to spawn 
at least one additional child process in response to 
detecting the increase; and 

a third process configured to detect a decrease in process- 
ing activity of said child processes and to terminate at 
least one of said child processes in response to detect- 
ing the decrease. 

10. The method of claim 9 wherein said step of spawning 
child processes, said step of detecting an increase in pro- 
cessing activity and said step of delecting a decrease in 
processing activity are performed by a single process. 

11. A data processing system having at least one data 
processing node, said data processing system comprising: 

a data structure on said data processing node receiving 
electronic mail objects; 
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child processes processing said electronic mail objects in 
parallel; 

a first process detecting a change in the number of said 
electronic mail objects in said data structure; 
5 a second process spawning an additional child process in 
response to an increase in the number of said electronic 
mail objects; 

a third process terminating a child process in response to 
a decrease in the number of said electronic mail 
objects; and 

at least one of said child processes processing additional 
electronic mail objects until terminated by said third 
process. 

15 12. The data processing system of claim 7 wherein said 
first process is configured to detect said change in the 
number of said electronic mail objects by detecting a change 
in the size of said data structure. 

13. The data processing system of claim 7 wherein a 
2Q single process comprises said first process, said second 

process and said third process. 

14. An article of manufacture including one or more 
computer readable media having program code stored 
thereon, the program code including instructions which, 

25 when executed by a processor, cause the processor to: 

spawn child processes to process in parallel electronic 

mail objects received in a data structure; 
detect a change in the number of the electronic mail 
objects in the data structure; 
30 spawn at least one additional child process in response to 
an increase in the number of the electronic mail objects 
in the data structure; and 
terminate at least one of the child processes in response to 
a decrease in the number of the electronic mail objects 
35 in the data structure, at least one of the child processes 
processing additional electronic mail objects until ter- 
minated. 

15. The article of claim 14 wherein detecting the change 
in the number of the electronic mail objects includes detect- 

40 ing a change in the size of the data structure. 

16. The article of claim 14 wherein spawning child 
processes to process the electronic mail objects includes 
spawning at least one child process to: 

select one of the electronic mail objects from the data 
45 structure; 

lock the selected electronic mail object to prevent selec- 
tion thereof by another of the child processes; 
process the selected electronic mail object; and 

remove the selected electronic mail object from the data 
50 4 4 J 
structure. 

17. An article of manufacture including one or more 
computer readable media having program code stored 
thereon, the program code including instructions which, 

55 when executed by a processor, cause the processor to: 

spawn child processes to process electronic mail objects 
in parallel; 

detect an increase in processing activity of the child 
processes; 

60 spawn at least one additional child process in response to 
detecting the increase; 
detect a decrease in processing activity of the child 

processes; and 
terminate at least one of the child processes in response to 
65 detecting the decrease. 

18. A computer system including a processor and a 
memory coupled to the processor, the memory having stored 
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therein a data structure to receive electronic mail objects, 
and program code which, when executed by the processor, 
causes the processor to: 

spawn child processes to process the electronic mail 
objects in the data structure in parallel; 

detect a change in the number of the electronic mail 
objects in the data structure; 

spawn at least one additional child process in response to 
an increase in the number of the electronic mail objects 
in the data structure; and 

terminate at least one of the child processes in response to 
a decrease in the number of the electronic mail objects 
in the data structure, at least one of the child processes 
processing additional electronic mail objects until ter- 
minated. 
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19. The computer system of claim 18 wherein detecting 
the change in the number of the electronic mail objects 
includes detecting a change in the size of the data structure. 

20. The computer system of claim 18 wherein spawning 
5 child processes to process the electronic mail objects 

includes spawning at least one child process to: 
select one of the electronic mail objects from the data 
structure; 

lock the selected electronic mail object to prevent selec- 
10 tion thereof by another of the child processes; 
process the selected electronic mail object; and 
remove the selected electronic mail object from the data 
structure. 

***** 
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